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I.  I8TBODOCTION 


1.  BACKGBOOND 


The  two  dimensional  scatter  plot  has  been  hailed  by  many 
statisticians  as  being  the  single  most  powerful  tool  used  in 
exploratory  data  analysis,  [Ref.  1].  A  scatter  plot  pres¬ 
ents  an  entire  data  set  in  a  compact,  unambiguous  and  easily 
understandable  format,  in  which  either: 

1.  the  points  lie  in  a  nearly  straight  line; 

2.  the  points  almost  lie  on  a  smooth  curve; 

3.  the  points  are  scattered  without  any  apparent  corre¬ 
lation  between  the  X  variables  and  the  Y  variables; 

4.  the  points  lie  somewhere  between  (1)  or  (2)  and  (3); 

5.  mcst  of  the  points  lie  near  a  straight  line  or  smooth 
curve  but  a  few  outliers  are  separated  from  the  rest. 
[Bef.  2] 

These  patterns  or  other  hidden  peculiarities  are  much  easier 
to  discover  during  a  brief  glimpse  at  a  well  prepared 
scatter  plot  than  during  an  examination  of  a  data  table,  for 
example,  the  strong  positive  correlation  between  total  users 
and  active  users  logged  on  to  the  R.R.  Church  computer 
system.  Figure  1.1,  is  more  easily  discerned  from  the 
plotted  points  than  from  the  tabulated  data1.  This  is  a 
good  example  of  case  (1),  described  above. 

Not  only  does  this  plot  point  out  the  positive  trend  in 
the  data,  it  also  demonstrates  that  it  is  nearly  linear  and 
provides  a  rough  estimate  of  the  relationship  between  the 
variables. 


1  The  table  in  Figupe  1.1  contains  only  a  small  portion  of 
the  472  data  points  included  in  the  plot.  A  complete  listing 
of  the  data  set  takes  approximately  two  pages  of  text  and  is 
not  required  for  demonstration  purposes. 
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Figure  1.1  Comparison  of  Data  Presentation  Hethods. 

Bore  precise  mathematical  expressions  and  confirmatory 
procedures,  including  goodness  of  fit  measures,  can  be 
obtained  by  employing  classical  regression  analysis  tech¬ 
niques,  a  logical  enhancement  of  simple  scatter  plots. 
Figure  1.2.  Numerical  quantifications  such  as  the  Pearson 
product  moment  correlation  also  provide  summaries  tut  can  be 
ambiguous  if  not  accompanied  by  other  information,  [Ref.  1  , 

P  77  ]. 

Scatter  plots  are  net  invulnerable  to  misinterpretation. 
Nhen  the  scatter  of  pcints  falls  into  category  (4)  or  (5), 
as  in  Figure  1.3,  it  may  not  be  possible  to  judge  the  true 
relationship  between  the  variables  during  a  quick  glance  at 
the  scatter  plot,  although  there  obviously  is  some  relation¬ 
ship.  Figure  1.3  contains  a  plot  of  the  first  200  points  of 
test  set  two  (Appendix  C)  which  is  used  in  Chapter  III, 
Section  2  to  test  LCSESS'  ability  to  follow  abrupt  changes 
in  curvature. 
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Figure  1.2  Linenr  Least  Squares  Regression  of 
active  Users  on  Total  Users  Logged  on  to  the 
f). R.  Church  Coaputer  System. 
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Figure  1.3  Scatter  Plot  of  the  First  200  Points 

of  Test  Set  Two. 


Initial  inspection  cf  this  data  suggests  the  presence  of 
a  quadratic  type  pattern.  This  impression  leads  naturally  to 
using  the  quadratic  least  squares  regression  line  of  Figure 
1.4  to  describe  the  dependence  of  X  on  X.  The  accompanying 
analysis  of  variance  table  lends  some  support  to  this 
choice,  since  r*  ■  .709. 

k  closer  examination  of  this  data  reveals,  however,  that 
although  it  lochs  guadratic,  the  actual  dependence  of  I  on  X 


Y  -  +/C  *  X  •  0  1  2  WHERE:  C  -  -0.26565  0.5^139  -0  513564 
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Figure  1.4  Quadratic  Regression  on  the  First  200 
Points  of  Test  Set  Two. 

is  not  described  guite  that  siaply.  Figure  1.5  deaonstrates 
this  point  very  clearly.  Splitting  .the  data  set  into  three 
parts  at  what  appear  to  be  logical  break  points,  !~~<i0#25), 
and  fitting  a  linear  least  squares  regression  line  to  each, 
shows  that  T  is  not  a  single  function  of  X  over  its  entire 
range.  In  fact,  there  appear  to  be  three  separate  linear 
trends  in  this  data. 

Analyses  of  this  type  ace  seldoa  undertaken  because  of 
the  tedius  involved  in  selecting  appropriate  splitting 
points  cnce  it  has  been  determined  that  doing  so  uay  be 
helpful. 

How  then,  can  an  analyst  discover  the  existence  of 
subtle  trends  or  define  the  shape  of  unusual  patterns 
contained  in  a  scatter  plot?  The  answer  is  to  use  local 
saoothing  procedures  rather  than  global  (regression)  fitting 


0>  *  Yvl 
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Figure  1.5  Linear  Regressions  on  Fxrst  200  Points  of 
Test  Set  Two  Split  at  Z  »  10  and  25. 

techniques.  Using  a  flexible  smoothing  procedure  that 
responds  to  local  changes  in  the  data  structure  allows  the 
data  itself  to  deteraine  the  shape  of  the  final  curve,  as 
opposed  to  the  classical  approach  of  fitting  polynomials 
which  have  predetermined  shapes. 

The  Robust  Locally  Weighted  Regression  and  Scatterplot 
Smoothing  (LOWESS)  procedure,  [Ref.  3],  described  in  the 
reiainder  of  this  paper,  is  a  very  good  aethod  for 
preventing  the  acceptance  of  assuaptions  like  the  one  that 
led  to  using  the  quadratic  nodel  in  Figure  1.4.  The  LCWESS 
smoothing  technique  applied  to  this  data,  the  right  hand 
plot  of  Figure  1.6,  shows  very  clearly,  that  the  dependence 
of  ¥  on  Z  reseables  a  coabination  of  three  distinct  linear 
functions  (the  paraaeter  F=.25  will  be  explained  later). 


The  LOWESS  smoothing  process  has  a  tendency  to  round  angular 
corners.  The  straight  lines  in  the  center  of  each  segment 
suggest  linear  trends  similar  to  those  contained  in  Figure 

1.5. 

The  major  problem  with  trying  to  use  polynomials  to 
depict  subtle  trends  cr  to  describe  unusual  relationships  in 
a  data  set,  is  that  they  are  neither  flexible  nor  local.  By 
way  of  example,  the  points  on  either  extreme  of  the  first  of 
the  twc  plots  in  Figure  1.6,  have  a  significant  affect  on 
the  middle  of  the  fitted  polynomials. 


QUADRATIC  REGRESSION  LOWESS  F  -  .25 


The  LOWESS  procedure  on  the  other  hand,  allows  the  data 
points  themselves  to  determine  the  shape  of  the  smoothed 
curve.  Figure  1.6  also  demonstrates  that  global  polynomial 
regressions  have  a  more  difficult  time  following  abrupt 
pattern  changes  than  do  local  smoothing  procedures. 

B.  SCOPE 

locally  Weighted  Regression  and  Scatterplot  Smoothing 
(LOWESS)  ,  introduced  by  William  S.  Cleveland  in  1977, 
[Ref.  3],  is  a  generalized  extension  of  the  locally  fitted 


polynomial  smoothing  techniques  used  for  many  years  in  the 
field  of  time  series*  analysis. 

The  essential  idea  behind  the  simplest  of  these  clas¬ 
sical  smoothing  techniques  is  the  following.  If  the  data 
points  (Xi,Yi)  come  from  an  additive  model  of  the  form 

Y,  -  G(X,)  +  e( 

2 

where  E  (€i)  =  0  and  Var  (€i)  =  C  and  G(Xi)  can  be  approxi¬ 

mated  locally,  over  the  interval  i-m, . . .i, i* 1,. . .i*n,  by  the 
linear  function 

Y,  -  Bq(X|)  +  B,(X ,)  x  xt+  6, 


then  averaging  the  Yi  over  this  range  yields 


where 


1 

2M+ 


j— u 


«*,)  -  B0(X,)  ♦  B,(X,)  x  X,+  e, 

cr2 

VAR(Y,)  *VAR(  €  ,)  -  — 


If  the  assumption  that  the€i  are  uncorrelated  is  true,  then 

A 

this  moving  average  process  produces  estimated  Yi's  that  are 
unbiased  and  have  smaller  variance  than  the  raw  Yi's.  This 
technique  makes  it  easier  to  distinguish  G(Xi)  through  the 
noise  (€i)  .  Using  a  bandwidth,  H,  larger  than  the  interval 


*  K  time  series  is  a  sequence  of  random  variables  Yi  which 
are  naturally  ordered  By  time  (l)  and  can  therefore  be 
presented  as  a  scatter  plot  of  Yi  versus  i.  Although  i  is 
usually  the  integers,  missing  values  can  occur. 
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will  introduce 


over  which  the  linearity  assumption  holds, 
tias  into  the  results.  [Ref.  4] 

The  purpose  of  this  thesis  is  to  translate  the  generali¬ 
zation  of  classical  snooting  techniques  proposed  by 
Cleveland  [Ref.  3],  and  expounded  upon  by  Chambers  et  al 
[Ref.  1],  into  user  friendly  computer  programs  available  for 
use  as  exploratory  data  analysis  tools  by  students  and 
faculty  cf  the  Naval  Postgraduate  School. 

LCWESS ,  written  in  APL,  an  acronym  for  "A  PROGRAMMING 
LANGUAGE,"  was  designed  to  be  used  alone  or  in  conjunction 
with  the  IBM  GRAFSTAT  statistical  graphics  package. 
GRAFSTAT,  an  experimental  program,  currently  under  develop¬ 
ment  by  the  IBM  Watson  Reaearch  Center,  is  available  at  the 
Naval  Postgraduate  School  for  test  and  evaluation  purposes 
[Ref.  5].  All  graphs  contained  in  this  paper  were  produced 
by  the  GENERAL  PLOT  function  of  the  GRAFSTAT  program. 

1CWS,  a  modification  of  LOWESS,  when  used  in  conjunction 
with  GRAFSTAT  and  expanded  versions  of  the  DRAFTSMAN  DISPLAY 
programs  described  is  [Ref.  6],  enhances  an  already  powerful 
exploratory  data  analysis  package. 

A  FORTRAN  version  of  the  basic  LOWESS  program  was 
designed  to  be  used  in  conjunction  with  either  DISPLA 
[Ref.  7],  or  any  other  W.R.  Church  computer  system  supported 
graphing  package. 

These  programs  are  interactive  and  can  be  used  easily  by 
individuals  who  have  little  or  no  APL  or  FORTRAN  programming 
skills.  Users  who  are  well  versed  in  these  languages  should 
be  able  to  modify  them  to  provide  tailor  made  outputs, 
expand  their  capabilities  or  incorporate  them  into  ether 
analysis  packages. 

Detailed  user  instructions  are  contained  in  Chapters  17 
and  7  while  examples  of  their  use  are  presented  in  Chapter 
III.  Users  who  are  interested  in  the  mathematical  details 
of  Robust  Locally  Weighted  Regression  and  Scatterplot 
Smoothing  should  read  Chapter  II. 


II.  TECHNICAL  DESCRIPTION  OF  LOW ESS 

A.  OYIRYIEW 

locally  Weighted  Regression  Scatterplot  Smoothing 
(10HESS) ,  is  a  generalized  extension  of  locally  fitted  poly¬ 
nomial  smoothing  techniques  used  by  many  statisticians  in 
time  series  analysis  *.  nnlike  its  predecessors,  however. 
10WESS  was  designed  to  work  on  unequally  as  well  as  equally 
spaced  X*s.  It  also  contains  a  robust  fitting  procedure 
that  guards  against  possible  distortion  of  the  smoothed 
curve  by  outlier  points.  The  general  procedure  used  by 
Cleveland  is  an  adaptation  of  iterated  least  squares  regres¬ 
sion  techniques  developed  by  Albert  Beaton  and  John  Tukey 
[Ref.  8]. 

The  overall  objective  of  LOHESS,  like  most  smoothing  or 
regression  routines,  is  to  compute  a  "fitted”  value,  Y,  that 
depicts  the  middle  of  the  empirical  distribution  of  Y  at 
each  X.  Unfortunately,  most  data  sets  do  not  contain  enough 
repeated  observations  at  each  X  to  provide  a  good  estimate 
of  the  middle  of  this  distribution.  LOWESS  derives  its  esti- 

A 

mate  of  Y  from  the  equation  of  a  weighted  least  sguares 
regression  line  fitted  to  a  set  of  data  points  whose  X 
values  are  located  in  a  user  defined  neighborhood  about  Xi 
(X  value  of  the  point  being  smoothed) . 

B.  HATBIHATICAL  DETAILS:  ROM-ROBUST  LOBESS  SSOOTHIHG 

The  first  step  in  generating  a  LOWESS  smoothed  point 
consists  of  forming  a  neighborhood.  Figure  2. 1,  centered 
around  Xi  and  comprised  of  its  Q  nearest  neighbors.  The  user 


»  A  brief  theoretical  explanation  of  these  techniques  was 
presented  in  Chapter  I. 


determines  Q  by  choosing  the  parameter  F,  which  is  approxi¬ 
mately  equal  to  the  percentage  of  the  number  of  data  points 
used  in  computing  each  fitted  value.  Q  is  (F  x  N)  rounded  to 
the  nearest  integer,  and  the  Q  nearest  neighbors  are  those 
points  whose  X  values  are  closest  to  Xi.  Note  that  there 
are  not  necessarily  an  equal  number  of  neighborhood  points 
on  either  side  of  Xi.  Also,  Xi  is  considered  to  Le  a 
neighbor  of  itself.  The  parameters  F  and  Q,  determined 
prior  to  smoothing  the  first  data  point,  are  held  constant 
and  used  throughout  the  procedure. 


•  iii 

X 


Figure  2.1  Tertical  Strip  Containing  the  10  Nearest 
Neighbors  of  X6  in  Data  Set  Two. 

In  Figure  2.1,  the  point  to  be  smoothed,  X6,  is  high¬ 
lighted  ty  a  dotted  line  and  the  strip  boundaries  are  delin¬ 
eated  by  solid  lines  passing  through  XI  and  X10. 

STEP  TWO  consists  of  defining  the  local  weighting  func¬ 
tion  and  calculating  individual  weights  for  each  point, 
(Xk,Ik)  ,  in  the  strip  formed  during  STEP  ONE.  This  weighting 
function  is  to  be  centered  at  Xi  and  scaled  so  that  it  hits 
zero  for  the  first  tiae  at  the  nearest  neighbor  cf  Xi 
(the  strip  boundary  furthest  froa  Xi).  Functions  havicg  the 
following  properties  will  satisfy  these  requireaents: 
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1.  W(U)  >0  for  |  U I  <  1  (positivity), 

2.  W  (-U)  *  S  (0)  (symmetry), 

3.  S  (0)  is  a  nonincreasing  function  for  u  >  0, 

4.  R(0)  *  0  for  1 0 1  >  1. 


Cleveland,  [Bef.  3],  suggests  using  a  tricube  veight  func¬ 
tion  of  the  form: 


W(U) 


(1  -  }Ul 3  ) 3  FOR  IUI  <  1 
0  OTHERWISE 


Note  that  this  function  uses  the  absolute  value  of  1).  The 
veight  given  to  any  point  within  the  strip  is  calculated  by: 


W(U) 


The  variable  Di  is  the  distance  along  the  X  axis  from  Xi  to 

its  Q—  nearest  neighbor.  This  is  the  distance  from  X6  to 

the  left  hand  boundary  in  Figure  2.1.  When  LOHESS  starts 

its  smoothing  pass  at  XI,  the  right  hand  boundary  passes 
.  th 

through  its  Q~~  nearest  neighbor,  X10  in  this  example.  The 
neighborhood  which,  at  that  time,  contains  the  points  XI  ... 
Xq  remains  fixed  until  the  distance  (Xi-XI)  is  greater  than 
(Xq-Xi).  This  usually  occurs  at  i  ■  Q/2  for  evenly  spaced 
data.  At  this  point  the  neighborhood  is  advanced  and  the  Q 
nearest  neighbor  shifts  to  the  left  hand  boundary  where  it 
remains  until  all  of  the  data  points  have  been  smoothed.  Di 
therefore,  is  generally  the  distance  from  Xi  to  the  right 
hand  boundary  for  i  *  1... (Q/2)  and  is  the  distance  from  Xi 
to  the  left  hand  boundary  for  i  *  (Q/2)...N. 

The  weight  given  to  any  point  in  the  strip  is  equal  to 
the  height  of  the  ctrve,  H(u),  at  XJt,  Figure  2.2.  This 
figure  demonstrates  that  the  tricube  veight  function: 
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gives  the  largest  weight  to  the  point  being  smoothed; 
decreases  smoothly  as  Xk  moves  away  from  Xi; 
is  symmetric  about  the  point  being  smoothed; 
hits  zero  for  the  first  time  at  the  Q~~  nearest 
neighbor  of  Xi. 


Figure  2.2  THICDEE  Beight  Function  for  the  10  learest 
■eighbors  of  X6  in  Data  Set  Two. 


In  cases  where  several  points  have  abscissas  egual  to 
Xi,  all  of  them  are  given  weight  1.  If  Di  is  zero,  meaning 
that  all  Q  points  in  the  strip  have  abscissas  egual  tc  Xi, 
it  is  impossible  to  estimate  the  slope  of  a  fitted  line.  In 
this  instance,  a  constant  egual  to  the  mean  Y  value  for  all 
C  points  is  fitted  tc  the  point  ( Xi , Y i) . 

STEP  THREE  uses  weighted  least  sguares  regression  tc  fi*. 
a  polynomial  of  degree  P  to  the  data  points  that  lie  within 
the  strip  containing  Xi.  The  parameters  of  the  equation 
that  describes  this  line  are  the  values  of  Bj  j  »  0,1,...P 
that  linicize: 


£  Wk(U)(Yk  -  Bo  -  BiX*  -  ...  BpXk  ) 


Figure  2.3  shows  straight  (p=1)  and  quadratic  (p*2)  lines 
fit  to  the  neighborhood  points  surrounding  X6  in  data  set 
two. 


LINEAR 


QUADRATIC 


Figure  2.3  Linear  and  Quadratic  Fits. 

The  choice  of  an  appropriate  P  depends  on  the  user's 
percepticn  of  the  relationship  between  the  points  within 
each  neighborhood,  the  need  for  flexibility  to  reproduce 
patterns  in  the  data,  and  computational  ease.  The  existence 
of  physical  theories  that  define  the  relationships  as  being 
nonlinear  light  also  influence  this  choice.  Smoothed  curves 
based  cn  higher  order  polynomial  regressions  tend  to  fellow 
abrupt  pattern  changes  better  than  those  based  on  linear 
models.  Cleveland  [lef.  3],  feels  that  computational 
considerations  begin  to  override  the  need  for  flexibility 
for  values  of  P  greater  than  1. 

The  smoothing  routine  written  for  this  thesis  is  capable 
of  performing  linear  cr  quadratic  regressions.  Using  p  =  1 
or  2  should  provide  adequately  smoothed  points  for  any  data 
set. 

The  final  step  in  the  Locally  Weighted  Regression 
portion  of  the  LCWESS  procedure  is  the  determination  cf  the 

A 

smoothed  point  (Xi,Yi),  Figure  2.4,  where: 
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-  Ibj(x,).x)j 

The  notation  used  here  emphasizes  that  the  coefficients  of 
the  Xi  are  different  for  each  point  li. 


Figure  2.4 


Scatter  Plot  of  Data  Set  Two 
Bith  Saoothed  Point  (X6,T6). 


Superiaposed 


LOWESS  differs  fros  aost  other  saoothing  routines 
because  it  saooths  all  of  the  data  points.  This  becoaes 
important  vhen  saoothing  saall  data  sets,  when  ia port  ant 
pattern  changes  take  place  near  the  ends  of  the  data  set,  or 
vhen  the  saoothed  curve  is  tc  be  used  as  a  regression  line 
to  predict  future  trends.  Figure  2.5  suaaarizes  the  sequence 
of  steps  described  above,  as  they  are  used  tc  compute  a 
"fitted”  value  for  (X20,Y20),  the  right  hand  end  point  in 
data  set  tvo. 

A  comparison  of  figures  2.1  and  2.5  reveals  that  the 
vidths  of  the  vertical  strips  about  (X6,Y6)  and  (12 0 # Y20) 
are  not  egual.  Bote  that  the  ten  nearest  neighbors  of  X20 
are  all  to  the  left.  Although  both  strips  coi  tain  ten  data 
points,  the  reguireaent  to  center  thea  around  their 
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X  X 

Figure  2.5  Suaaary  of  Steps  Heguired  for  Coaputing  the 
Saoothed  Value  at  (X20,T20r  in  Data  Set  Two. 

respective  (Xi,Yi)  points  forces  the  right  hand  pcrticn  of 
the  weighting  function  in  Figure  2.5  to  fall  off-scale.  The 
left  hand  portion  of  the  weighting  function  for  (XI, Y1)  is 
forced  off  scale  for  the  saae  reason.  These  partial 
weighting  functions  still  fulfill  all  of  the  requireaents 
outlined  earlier,  however.  Unegual  spacing  of  the  X*s  also 
creates  variable  strip  widths. 


A  set  of  saoothed  data  points.  Figure  2.6,  is  obtained 
ty  coapleting  the  afcreaentioned  steps  for  each  point  in  the 


Figure  2.6  Plots  of  Loves s  Smoothed  Data  Points  and 
Smoothed  Curve  Superimposed  on  Data  Set  Two,  (Fa.5) . 

C.  MATHEMATICAL  DETAILS:  BOBOST  LOVESS  SMOOTHING 

The  robust  snoot hing  feature  of  LOU  ESS  prevents  a  snail 
number  of  outliers  frcn  distorting  the  snoothed  curve.  The 
point  (X10,Y10)  in  Figure  2.1  is  one  such  outlier. 

The  robust  procedure  computes  a  nev  set  of  weights  for 
each  (Xi,¥i)  based  on  the  size  of  the  residuals,  (Yi-Yi), 
obtained  after  the  first  smoothing  pass.  Figure  2.7. 

Cleveland  £Bef.  3],  suggests  using  a  bisguare  function 
of  the  fora: 

D(V)  -  |  (1  -  V2  )2  FOR  M  <  1 
[  0  OTHERWISE 

F.cbustness  weights  fcr  each  point  are  calculated  by: 

OkM  -  0[£] 

where  H  is  the  median  of  the  absolute  value  of  the  resi¬ 
duals,  Figure  2.8.  This  is  soaetiaes  referred  to  as  the 
Median  Aisolute  Deviation  (MAD). 
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Figure  2.7  Residuals  (Ii-Ti)  Versus  Xi  for  the 
Hon-Bobust  Smoothed  Points  of  Data  Set  Tvo. 


Figure  2.8  Bobnst  Weighting  Function  For  the  First 
Pass  Through  Data  Set  Tvo. 

This  schene  gives  snail  weights  to  points  associated 
vith  large  residuals  and  large  weights  to  points  with  snail 
residuals.  One  iteration  of  the  robust  locally  weighted 
regression  procedure  is  completed  by  calculating  a  new  set 
of  "fitted"  values  using  the  weighting  function 

WT  -•  W(U)»D(V) 


in  step  three. 


Execution  of  the  entire  LCHESS  algorithm  consisting  of 
one  locally  weighted  regression  pass  and  two  robust  locally 
weighted  regression  passes  produces  a  robust  smoothed  curve. 
Figure  2.9.  The  effect  of  the  "outlier"  can  be  seen  very 
clearly. 


N0N-R08UST  LOWESS  F  -  .5  ROBUST  10WCSS  F.  -  .5 


•  »  >  I  0  12} 

V  > 


Figure  2.9  Comparison  of  Hon-Bobust  and  Robust  LORESS 
Smoothing  of  Data  Set  Two,  (F=.5). 

Cleveland  [Bef.  3],  reports  that  the  number  of  computa¬ 
tions  reguired  to  complete  the  LORESS  algorithm  <  n  an  entire 
data  set  is  on  the  order  of  FN*.  For  example,  60  linear 
regressions  were  used  to  complete  the  robust  smoothing  of 
the  20  artificial  data  points  in  Figure  2.9.  The  non-rcbust 
curve,  on  the  other  hand,  reguired  2/3  fewer  calculations 
and  took  less  than  1/2  the  time.  The  number  of  calculations 
reguired  to  produce  a  smoothed  curve  presents  no  significant 
problem  for  plots  of  fewer  than  100  points.  Computational 
time  can  be  saved  by  grouping  the  Xi's  on  data  sets  that 
have  repeated  X  values.  This  saving  results  from  the  fact 
that  if  Xi+1  ■  Xi  then  Yi ♦ 1  *  Yi.  Assigning  the  same  Yi 
value  to  each  of  the  Ni  repeated  Xi*s  reduces  the  number  of 
regressions  reguired  by  Hi  for  non-robust  smoothing  and  by 
3Ni  for  cobust  smoothing. 
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D.  CBCOSIHG  F 


There  are  no  set  criteria  for  choosing  F.  Small  values 
produce  curves  with  high  resolution  and  a  lot  of  noise, 
larger  F's  produce  curves  with  low  resolution  and  less 
noise,  but  require  increased  computational  time.  In 
general,  increasing  F  tends  to  produce  smoother  curves. 
Figure  2.10.  Cleveland,  [Bef.  3],  suggests  that  values 
between  .2  and  .8  shculd  be  satisfactory  for  most  purposes. 
The  goal  is  to  choose  the  largest  F  that  minimizes  the  vari¬ 
ability  in  the  smoothed  points  without  distorting  patterns 
in  the  data.  Computational  time  may  become  a  consideration 
in  choosing  F  when  sioothing  large  data  sets.  In  general 
though,  F  will  decrease  as  the  series  length  increases. 


ROBUST  IOWESS  f  -  .2 


x 

ROBUST  IOWESS  r  -  .5 


ROBUST  LOWCSS  F  ■  .3 


x 

ROBUST  IOWESS  r  «  .7 


Smoothing  routines,  LOWESS  included,  do  not  provide 
regression  eguations  or  other  analytical  results  on  which  to 
test  goodness  of  fit.  The  user  must  judae  the  adequacy  of 
the  results.  The  choice  of  F  is  not  so  critical  for  cases  in 
which  the  purpose  of  the  smoothing  is  to  enhance  the  visual 
perception  of  gross  patterns  is  the  data.  For  example,  the 
rough  curve  obtained  by  using  Fs.2  on  data  set  two,  the  left 
hand  plot  of  Figure  2.10,  provides  an  adequate  picture  of  an 
overall  increasing  trend.  More  care  must  be  taken  in  some 
applications,  such  as  time  series  analysis,  or  when  the 
smoothed  (Xi,7i)  values  may  be  used  as  a  type  of  regression 
function,  or  finally,  when  the  smoothed  curve  may  be 
presented  without  an  accompanying  plot  of  the  original  data 
points.  Taking  F*.5  is  a  reasonable  choice  when  there  is  no 
clear  idea  of  what  is  needed,  [Hef.  3].  Chambers,  [fief.  1], 
suggests  that  it  is  often  wise  to  try  several  values  of  F 
before  selecting  the  "best"  one  for  a  particular 
application. 

Techniques  for  determining  bandwidth  using  techniques  of 
cross-validation  have  been  considered  by  Cleveland  [Hef.  3], 
and  Pice  [fief.  9],  but  are  not  included  here. 
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III.  EVALUATION  Of  THJ  LOHESS  C2£V£  S HOOT HIS G  £B0GR2H 


A.  GENERAL 

Smoothing  routines  are  generally  used  to  filter  noisy 
data  and  approximate  underlying  relationships  that  may  be 
too  complex  to  describe  mathematically  or  too  difficult  tc 
fit  by  simple  polynomial  regression.  Effective  routines 
must  be  flexible  and  local.  They  must  allow  the  data  to 
determine  the  shape  cf  the  smoothed  curve  and  they  must  be 
able  to  follow  abrupt  as  well  as  smooth  changes  in  curva¬ 
ture.  This  evaluation  will  test  LOHESS  in  each  of  these 
areas. 

B.  HETHCDOLOGY 

I CHESS,  like  most  other  curve  smoothing  schemes, 
provides  no  analytical  solutions  by  which  to  measure  its 
effectiveness.  The  correctness  or  adequacy  of  the  fit  must 
be  judged  subjectively.  And  there  are  no  standard  guidlines 
to  follow.  Sometimes  the  shape  of  the  fit  can  be  checked  by 
comparing  it  to  the  physical  laws  that  govern  the  applica¬ 
tion  at  hand.  The  programs  written  to  support  this  thesis 
were  evaluated  by: 

1.  examining  their  performance  on  a  set  of  test  data  for 
which  the  underlying  functional  relationships  were 
known; 

2.  comparing  their  results  with  those  obtained  from 
widely  used  and  previously  validated  curve  smoothing 
techniques,  namely;  LEAST  SQUARES  REGRESSIOH,  MOVING 
AVERAGE  and  COSINE  ARCH  weighted  smoothing. 

The  theory  of  moving  average  procedures  dates  back  to 
definitive  studies  of  discrete  time  series  models  completed 


by  H.  Wold  in  the  aid  1930*5.  The  general  process  is  based 
on  the  assumptions  and  theories  recounted  in  Chapter  I.  The 
moving  average  is  defined  by  the  expression 

N 

X(T)  =  [  Aj  Z(T-J)  T  =  0,  1  ... 

J— M 

where  M  and  N  .are  ncnnegative  integers  and  the  weighting 
coefficients  Aj  are  real  constants.  Kendall  and  Stuart 
[Bef.  4],  and  Koopaans  [Bef.  10],  present  in  depth  discus- 
sions  and  theoretical  derivations  that  expand  on  the  ideas 
presented  in  Chapter  Z.  The  moving  average  routine  employed 
in  this  analysis  is  contained  in  the  IBM  GB1FSTAT  statis¬ 
tical  graphics  package.  The  weighting  function  used  in  that 
program  takes  the  fora 


Aj  ■  JL  J  »  -M...  N 

M 

The  COSINE  ABCH  smoothing  procedure  used  here,  is  a 
moving  average  process  that  uses  a  cosine  weighting  function 
of  the  fora 


A 


j 


1 

M-M 


i  -  cos-gQj±a. 

M+1 


J'0,1...  N-l 


It  is  characterized  as  a  good  smoother  by  Ansccabe, 
[Bef.  11],  and  is  often  used  as  a  trend  remover  during  time 
series  analysis. 

C.  TESTING  PBOCEDOB IS  AND  BESOLTS 


Three  sets  of  test  data  were  developed  to  check  all 
aspects  of  the  LOWESS  program's  capabilities;  its  ability  to 


follow  linear  trends  as  well  as  abrupt  and  smooth  changes  in 
curvature. 


1.  Phase  0n§:  Linear  Trends 

Test  set  one.  Figure  3.1,  consists  of  150  data 
points  having  the  following  functional  relationship: 

Y  =  X  +  NORMAL(O.I)  NOISE  OiXilO 

was  designed  to  test  IOHESS'  ability  to  detect  linear  trends 
in  noisy  data.  Although  this  test  appears  redundant,  many 
complex  smoothing  procedures  have  failed  because  they  did 
not  return  straight  lines  when  that  was  the  shape  of  the 
underlying  curve. 


Figure  3.1  Test  Set  One  With  and  Without  11(0,1)  Boise. 

lhe  adequacy  of  LOBESS'  performance  on  test  set  one 
was  measured  by  comparing  it  with  a  linear  least  sguares 
regression  line  fitted  to  the  sane  data. 

As  pointed  out  in  CHAPTER  II,  LOBESS  produces 
increasingly  smoother  curves  as  the  parameter  F  approaches 
1.  When  F*1,  each  neighborhood  used  throughout  the  sxoothing 
process  contains  H  •  1  *  H  points.  This  implies  that  each 
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smoothed  point  (Xi,Yi)  is  computed  from  the  equation  of  the 
TRICOBE  weighted  regression  line  fitted  to  all  of  the  data. 
This  procedure  should  produce  a  LOHESS  smoothed  curve  that 
closely  resembles  the  linear  regression  of  7  on  X.  The 
TRICUEE  weighting  function  used  in  LOHESS  may  cause  minor 
disparities  between  the  two  "fits,"  however.  A  visual 
inspection  of  the  bottom  two  plots  in  Figure  3.2  reveals 
that  LOHESS  and  the  linear  regression  produced  nearly 
identical  "fits." 


LOWfSS  fm.7 


lowcss  r-.s 
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Figure  3.2  Comparison  of  LOBESS  Smoothing  and  Linear 
Regression  of  Test  Set  One. 


Goodness  of  fit  can  he  measured  by  examining  the 
residuals  { T i—  T i )  from  each  smoothing  procedure.  A  perfect 
reproduction  of  the  underlying  functional  relationship,  7  = 


X,  would  produce  a  set  of  residuals  distributed  Normal  (0,  1)  , 
the  same  distribution  found  in  the  noise.  The  results  of  the 
GRAFSTAT  distribution  fitting  proceedure  summarized  in  Table 
II  indicate  that  the  distribution  of  the  regression  resi¬ 
duals  can  be  approximated  as  Normal (0, 1. 04)  while  the  LGWESS 
residuals  are  approxiiately  Normal(.002, 1. 016) . 

Hypothesis  tests  comparing  the  means  and  variances 
of  these  distributions  with  those  of  the  Normal  (0,1) 
distributed  noise,  will  provide  some  measure  of  the  goodness 
of  fit  of  each  smoothing  scheme.  The  results  of  these 
tests,  conducted  at  the  95S  confidence  level,  are  summarized 
in  Table  I. 

The  output  of  the  GRAFSTAT  distribution  fitting 
procedure  presented  in  Table  II  and  the  hypothesis  tests 
summarized  in  Table  I,  suggest  that  there  is  no  significant 
difference  between  the  distribution  of  the  residuals  from 
the  linear  regression  or  LOWZSS  smoothing  of  test  set  one, 
and  the  Normal  (0,1)  noise  incorporated  into  the  data.  This 
provides  strong  support  for  the  premise  that  LORESS  depicts 
linear  trends  very  well.  Visual  comparison  of  the  LOWZSS 
smooths  in  Figure  3.2  confirms*  that  LORESS  follows  the  same 
general  trend  regardless  of  what  F  is  used;  small  values 
provide  rougher  curves  that  have  the  same  general  slope. 


TABLE 
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TABLE  II 


Suanary  of  GBAFSTAT  Distribution  Pitting  of 
Besiduals  froa  Regression  ana  10WESS  Smooths  of  Test  Set  One 
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RESIDUALS  FROM  LOWESS  SMOOTHING 
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2 .  Fhase  Two:  Afcrupt  Changes  iji  Curvature 

Test  set  two,  Figure  3.3,  consisting  o£  220  data 
points  having  the  following  mathematical  relationship 

.4X  +  NORMAL(O.I)  NOISE  0SXS10 

Y  =  ,  3  +  .IX  +  NORMAL(O.l)  NOISE  10<X*25 

14.6  -  3.67X  +  NORMAL(O.I)  NOISE  25<X*40 
10  +  NORMAL(O.I)  NOISE  40<XS44 

was  used  to  test  LOUISS*  ability  to  handle  abrupt  pattern 
changes.  The  saooth  of  test  set  two  generated  by  X.OWESS,  was 
conpared  to  those  produced  by  MOVING  AVERAGE  and  COSINE  ARCH 
filtering  of  the  sane  data. 


Determining  the  asount  of  smoothing  reguired  by  a 
data  set  is,  perhaps,  the  aost  difficult  aspect  of  using  any 
curve  smoothing  routine.  Smoothness  is  controlled  by  the 
size  of  the  parameter  F  in  LOHESS  and  by  the  parameter  n 
(bandwidth)  in  MOVING  AVERAGE  and  COSINE  ARCH  smoothing. 
These  parameters  determine  the  number  of  points,  or  neigh¬ 
borhood  size,  used  to  compute  each  smoothed  value.  The  goal, 
regardless  of  the  method  chosen,  is  to  use  the  largest 
neighborhood  that  minimizes  the  variability  in  the  smoothed 
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points  without  distorting  patterns  in  the  data.  Another 
factor  that  aust  also  be  considered  when  choosing  M,  is  that 
ROVING  AVERAGE  and  CCSINE  ARCH  smoothing  routines  produce 
only  (N-H)  smoothed  points.  Using  proportionately  large 
values  of  H,  therefore,  night  result  in  losing  significant 
portions  of  the  original  pattern  at  the  ends.  This  shortcom- 
ming  will  be  evident  in  the  graphical  comparisons  made 
throughout  the  remainder  of  this  chapter. 

Coaparison  tests  made  during  phases  two  and  three  of 
this  evaluation  used  selected  LOWESS  smooths  and  corre¬ 
sponding  ROVING  AVERAGE  and  COSINE  ARCH  smoothed  curves. 
Parameters  for  the  three  processes  are  directly  convertible 
by  the  relationship  M  «  F«N. 

Figure  3.4  presents  graphical  comparisons  of  LOWESS 
smooths  (solid  line)  using  parameter  values  F  «  .15,. 25,. 50 
and  .75  to  illustrate  some  of  the  considerations  made  during 
the  parameter  selection  phase  of 

a  smoothing  operation.  The  exact  underlying  relationships 
(dashed  lines)  were  included  to  demonstrate  how  large  values 
of  F  can  cause  pattern  distortion. 

It  is  apparent  from  the  sequence  of  illustrations  in 
Figure  3.4,  that  1CWESS  produces  smoother  curves  as  F 
increases.  The  smoothest  curves  are  not  always  the  most 
desireable,  however.  The  bottom  two  curves  (F*.50  and  F*.75) 
have  distorted  the  original  pattern  by  using  too  many  points 
to  compute  the  smoothed  values.  Test  set  two  coctains  50 
points  in  the  segment  (0SX310).  Using  a  neighborhood  much 
larger  than  220*. 25  *  55  points  on  this  data  set  would  have 
a  tendency  to  fit  the  wrong  slope  to  the  first  linear 
segment.  Additionally,  it  would  cause  over  smoothing  of  the 
corners.  Figure  3.5  shows  the  neighborhood  and  linear 
regression  used  to  sicoth  the  point  (X10,Y10)  during  produc¬ 
tion  of  the  smoothed  curve  (F«.75)  pictured  in  the  lower 
right  cornet  of  Figure  3.4.  It  is  easy  to  see  that  following 
this  slope  would  distort  the  pattern  presented  by  the  data. 


Figure  3.5  Linear  legression 
in  Test  Set  Two  Using 


Step  in  Saoothing  (X10,Y10) 
LOBESS  Bith  F-.75. 


The  F*.15  plot  depicted  in  Figure  3.4,  deaonstrates 
that  saall  F's  create  very  locally  saoothed  curves  that 


contain  a  great  deal  c f  noise  but  follow  gross  patterns  very 
well.  Using  a  snail  F  is  an  excellent  idea  if  the  sole 
purpose  of  the  snoothing  is  to  highlight  najor  trends  in  the 
data. 

The  LOWESS  snoothed  curve  obtained  by  using  F=.25  is 
the  one  best  suited  fcr  comparison  with  corresponding  MOVING 
AVERAGE  and  COSINE  ARCH  smooths.  Figure  3.6. 


TEST  SET  TWO 


LOWESS  F  -  2 


the  curve  or  fit  the  ends  by  eye.  Applying  these  techniques 
to  the  bottom  curves  in  Figure  3.6  does  not  reveal  any 
significant  pattern  changes.  LOWESS,  although  it  does  not 
follow  the  level  trend  accurately,  does  reveal  a  major 
pattern  change  in  the  last  section  of  the  data. 

All  three  of  the  procedures  have  a  tendency  to  round 
sharp  corners  as  the  parameters  F  and  M  are  increased.  The 
MOVING  AVERAGE  curve,  in  the  lower  left,  has  a  very  rounded 
shape  and  does  not  highlight  the  linear  trend  in  segments 
one  or  two.  The  COSINE  ARCH  filter  does  a  little  better.  It 
portrays  the  linearity  of  section  three  with  nearly  the 
correct  slope  but  fits  segments  one  and  two  with  one  smooth 
curve.  Additionally,  it  has  added  a  misleading  hump  at  the 
intersection  of  segments  two  and  three.  LOKESS  is  the  only 
procedure  that  clearly  pictures  the  underlying  pattern  as  a 
series  of  straight  lines.  An  experienced  user  who  under¬ 
stands  that  LORESS  rounds  corners,  could  almost  duplicate 
the  original  pattern  by  connecting  the  linear  portions  of 
the  curve. 

Smoothing  procedures  are  not  only  judged  on  their 
ability  to  depict  patterns,  but  are  also  rated  on  their 
ability  to  filter  out  unwanted  noise.  Gross  differences  in 
their  capabilities  can  be  picked  out  easily  in  a  graphical 
comparison.  It  is  readily  apparent  that  the  MOVING  AVERAGE 
curve  in  Figure  3.6  is  much  noisier  that  either  the  LOWESS 
or  COSINE  ARCH  smooths. 

A  more  analytical  measure  of  a  procedure's  smoothing 
ability  can  be  made  by  comparing  periodograms  of  the  unfil¬ 
tered  and  filtered  data.  A  periodogram  is  an  analysis  tech¬ 
nique  used  to  estimate  the  spectral  density  function  of  a 
time  series  at  periodic  frequencies,  Xv.  The  periodcgram 
function  is  defined  by 


Vv-^lExa)  e"x,’I  s 

Refer  to  Koopoans  [Bef.  10],  chapter  8,  for  a  detailed 
discussion  of  the  periodogram  and  its  distributional  proper¬ 
ties.  The  periodograas  in  Figure  3.7  provide 


TEST  SET  TWO  WITH  NOISE 


TEST  SET  TWO  WtTHOUT  NOISE 


mcouCNor 
LOWESS  F  -  .2 


rwtouocr 

MOVING  AVERAGE  M  -  44 


periodicities,  the  spectral  frequencies  of  which  are 
measured  along  the  abscissa.  The  height  of  the  lines  is  an 
indicator  of  the  significance  of  the  associated  frequencies. 
The  plots  in  Figure  3.7,  were  truncated  at  Y  *  6  to  prevent 
the  obscuration  of  the  minor  frequencies. 

A  visual  inspection  of  these  periodograms  reveals 
that  IOWESS  produces  the  smoothest  (most  noise  free)  curve. 
In  fact,  the  period'ogram  of  the  LOWESS  curve  and  noise  free 
data  are  nearly  identical. 

All  of  this  evidence  supports  the  conclusion  that 
LOWESS  performs  at  least  as  well  on  data  sets  that  contain 
abrupt  changes  in  curvature  as  do  the  widely  accepted  MOVING 
AVERAGE  and  COSINE  ARCH  procedures. 

3*  Ehase  Three:  Smooth  Changes  in  Curvature 

Test  set  three.  Figure  3.8,  comprised  of  100  data 
points  having  the  following  relationship 


Y  =  SIN  X  +  NORMAl(O.I)  NOISE  0*X£2 


was  used  to  evaluate  LOWESS*  ability  to  follow  siocth 
changes  in  curvature.  The  same  procedures  used  in  the 
preceding  section  to  test  LOWESS*  ability  to  Handle  abrupt 
pattern  changes  were  applied  here. 

Test  set  three  appears  to  either  have  a  negative 
linear  trend,  or  appears  to  cycle  about  the  line  Y  *  0.  A 
series  of  LOWESS  smooths.  Figure  3.9,  starting  with  a  small 
F  parameter,  was  used  to  discover  the  general  pattern 
(dashed  line)  and  refine  the  resulting  smoothed  curve  (solid 
line)  .  The  distorted  smooth  in  the  lower  right  hand  plot 
demonstrates  the  inherent  danger  in  selecting  a  large  F  if 
only  cce  smoothing  pass  is  planned. 


TEST  SET  THREE  WITHOUT  NOISE 


TEST  SET  THREE  WITH  NOISE 


Figure  3.8  Test  Set  Three  Bith  and  Bithout  11(0,1)  Noise 
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Figure  3.9  Comparison  of  LOBBSS  Saoothing  of  Test  Set 
Three  Using  Different  Values  of  the  Paraaeter  F. 


The  LOBESS  curve  obtained  by  using  Fs.2S  provided 
the  aost  saoothing  without  distorting  the  pattern  and  was 


used  in  a  direct  comparison  with  corresponding  MOVING 
AVEHAGE  and  COSINE  ARCH  smooths.  Figure  3.10.  The  LCWESS 
smooth  is  the  only  curve  that  has  the  characteristic  sinu¬ 
soidal  shape.  The  MOVING  AVERAGE  plot,  although  very  noisy, 
would  present  the  proper  picture  if  the  ends  of  the  curve 
were  extended.  The  radical  change  in  curvature  on  the  left 
end  of  the  COSINE  ARC#  smoothed  curve  detracts  from  its 
abiliity  to  represent  the  true  shape  of  test  set  three. 


TEST  ST  THREE  lORESS  f  -  M 


Figure  3.10  Coacarison  of  LCRESS,  MOVING  AVERAGE  and 
COSINE  ARCH  Smoothing  of  Test  Set  Three. 

Coaparison  of  the  periodograas  presented  in  Figure 
3.11,  shows,  once  again,  that  I0BESS  produces  the  smoothest 
curve,  while  Figure  3.10  shows  that  it  seeas  to  follow  the 
aodel  the  best. 
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Figure  3.11  Comparison  of  Periodograms  of  LOBESS, 
AVEBAGE  ans  COSINE  AHCH  Saoothing  of  Test  Set  To 


MOVING 

ree. 


The  graphical  comparisons  made  in  Figure  3.10  and 
3.11  demonstrate  clearly  that  LOWESS  performs  at  least  as 
well  as  MOVING  AVERAGE  and  COSINE  ARCH  routines  when 
smoothing  data  that  has  a  smooth  curvilinear  pattern. 


4 .  Phase  four;  Unequal  Spacing 

Besides  being  able  to  smooth  all  of  the  data  points, 
I0WESS  enjoys  another  possible  advantage  over  MOVING  AVERAGE 
type  procedures,  in  that  it  was  designed  to  work  on  unequal 
as  well  as  equally  spaced  data.  The  definition  of  MOVING 
AVERAGES 


Yl  “  Z  AjYi-j  1  “ 


holds  only  if  the  Yi*s  are  equally  spaced  and  have  a  linear 
relationship  over  the  interval  (i-m)  ...  (i+m).  Violation  of 
the  linearity  assumption  introduces  bias  into  the  results 
while  violation  of  the  equal  spacing  requirement  invalidates 
them.  LCWESS  would  indeed  enjoy  a  distinct  advantage  over 
MOVING  AVERAGE  type  smoothing  procedures  if  it  produces 
acceptable  results  on  irregularly  spaced  data. 

This  section  examines  IOHESS’  ability  to  smooth  two 
different  sets  of  this  of  type  data.  The  first,  natural  log 
of  energy  dissipation  versus  depth.  Figure  3.12,  is  a  trans¬ 
formed  portion  of  data  collected  during  a  turbulence  meas¬ 
uring  experiment  conducted  by  the  Department  of 
Oceanography,  0. s.  Naval  Postgraduate  School. 

The  LOHESS  curves  obtained  by  using  linear  and  quad¬ 
ratic  regressions  during  Step  Three  of  the  smoothing  proce¬ 
dure  were  compared  to  a  quadratic  least  squares  regression 
line  fit  to  the  same  data.  Figure  3.13 

Higher  order  regressions  were  rejected  as  plausible  solu¬ 
tions  because  the  regression  coefficients  Bj,  j  *  3,4,5... 
were  found  to  be  statistically  insignificant  compared  to  the 
B j,  j  *  0,1,2  constants.  A  quadratic  relationship  also 
seemed  to  be  a  reascnable  assumption  since  turbulence  is  a 
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Figure  3.12  Natural  Log  of  Energy  Dissipation  vs  Depth. 
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Figure  3.13  Quadratic  Begression  and  Analysis  of  Variance 
Table  for  Ln  Energy  Dissipation  Versus  Depth. 


functi.cn  of  pressure  which  varies  in  proportion  to  depth 
squared. 

Figure  3.14  shows  that  the  LOWESS  curves  (solid 
lines)  for  the  linear  (P  =  1)  smooths  follow  the  general 
quadratic  regression  (dashed  lines)  for  small  values  of  F 
tat  flatten  the  pattern  for  large  F’s.  The  quadratic  (P  =  2) 
LOWESS  curves  close  in  on  the  regression  line  as  F  increases 
and  produce  a  fairly  good  match  as  F  reaches  .75. 

The  quadratic  LOWESS  curve  also  appears  to  follow 

local  peaks  and  valleys  more  accurately  for  small  F's  than 

does  its  linear  counterpart.  This  is  not  unexpected.  Figure 

3.15  shows  that  the  characteristically  bowed  shape  of  a 

A 

quadratic  curve  produces  larger  Yi  values  in  the  middle  of  a 
data  set  (Xi  is  located  in  the  middle  of  the  LOWESS  neigh¬ 
borhood)  than  a  straight  line  fitted  to  the  same  data. 

The  "fits"  of  Figure  3.14  can  be  compared  analyt¬ 
ically,  as  was  done  in  the  Phase  One  test,  by  examining  the 
distribution  of  their  residuals.  Combining  these  analytical 
results  with  graphical  comparisons  provides  some  goodness  of 
fit  measure  for  the  two  curves.  The  nonparamet ric  Smirnov 
two  sample  test  [Bei.  12],  is  appropriate  in  this  case 
because  the  distribution  of  the  residuals  is  unknown.  The 
results  cf  this  test  conducted  at  the  95%  confidence  level. 
Table  III,  indicate  the  there  is  no  significant  statistical 
difference  between  the  F=.75  quadratic  LOWESS  curve  and  the 
quadratic  least  squares  regression  line.  See  the  lower  right 
hand  plot  of  Figure  3.14 

This  example  demonstrates  that  LOWESS  works  quite 
well  on  unequally  spaced  data.  It  also  shows  that  quadratic 
LOWESS  wcrks  better  than  the  linear  model  when  neighborhood 
sizes  are  too  large  to  support  the  assumption  that  the 
neighborhood  points  are  related  linearly.  Quadratic  LCWESS 
should  be  used  whenever  the  data  suggests  that  that  assump¬ 
tion  is  not  true. 


ROBUST  LOWESS  SMOOTHING:  ENERGY  DISSIPATION  DATA 
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Figure  3.15  LOVESS  Smoothing  of  X53  is  Energy  Dissipation 
Data  Osing  Linear  and  Quadratic  Regressions  m  Step  Three. 


TABLE  III 


Sfirnov  Test  Comparing  the  Distribution  of 
Besiduals  iron  Smoothing  ana  Regression  of  Energy  Data 


iir 

guad 

guad 


T 

.216 

.156 

.156 

.078 


Ks  f .  95) 
.149 
.  149 
.  149 
.  149 


reject 
re5  ect 
reject 
accept 


The  NEAR  (1)  process,  derived  by  Lawrence  and  lewis 
£Ref.  13],  is  a  new  first  order  autoregressive  tioe  scries 
■odel  with  exponentially  distributed  aarginals.  NEAR  (1)  data 
is  generated  as  a  siiple  linear  combination  of  a  series,  En, 
of  independent  exponential  randoa  variables  by  the  model 


X„  =  €«  +  BX„_,  W.P.  A 

0  W.P.  (1  -A) 


N  -  0.1.2  ... 


1-B 


W.P. 


1-(1-A)B 

AB 


(1  -A)BE^  W.P. 


N  *=  0.1.2 


Figure  3.16  Lag-1  Plot  of  KEAB(I)  Random  Variables 
Having  Autocorrelation  .75. 

These  HEAR ( 1 )  variables  have  some  interesting  prop¬ 
erties  that  uake  then  especially  suitable  for  testing 
saoothing  routines.  They  have  fixed  serial  lag-1  correla¬ 
tion,  f)  *  AB  and  have  conditional  expectation 

C[XHlXN.,aX]  *  (t-AB)X  +  ABX 

The  following  parameters  were  used  to  generate  the  variables 
for  the  test;  A*. 83,  B*.9,  Xs  1.  A  successful  saooth  of 

Figure  3.16  should  produce  a  straight  line  of  the  fora 

Y  *  .25  +  .75X 

not  at  all  what  one  would  expect  froa  looking  at  the  plot. 
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Figure  3.17  presents  comparison  plots  of  robust  and 
non-robust  linear  regression  and  robust  and  non-robust 
LOW  ESS  smoothing  of  the  near(1)  data  of  Figure  3.16.  The 
robust  regression  function  contained  in  the  IBM  GRAFSTAT 
package  was  used  in  this  example. 

Examination  of  the  plots  in  Figure  3.17  shows,  once 
again,  that  LOWESS  smooths  are  comparable  to  those  produced 
by  accepted  linear  regression  technigues.  It  also  reveals 
that  neither  the  linear  regression  nor  LOWESS  procedures 
were  able  to  reproduce  the  true  lag-1  relationship,  (7  *  .25 
♦  . 75X),  shown  in  tbe  lower  right  hand  plot.  Both  robust 
curves  do  present  an  accurate  picture  of  where  most  of  the 
data  points  lie,  and  could  be  used  to  predict  where  a 
majority  of  the  future  points  are  likely  to  fall.  Relying  on 
these  curves,  however,  would  probably  lead  to  the  conclusion 
that  the  points  abcve  and  below  these  lines  represent 
outliers,  which  may  cr  may  not  be  the  case. 

It  must  be  concluded  from  LOWESS'  performance  on 
these  two  data  sets,  however,  that  it  smooth's  unegually 
spaced  data  as  well  as  currently  available  regression 
technigues. 
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Figure  3.17  Comparison  of  Robust  and  Non-Bobust  linear 
Regression  and  LCHESS  Saoothing  of  the  Lag- 1  Plot 

of  NEAR  ( 1)  Data. 


17.  05ISG  THE  APL  VERSION  OP  LOS ESS 

A.  OVERVIEW 

This  chapter  provides  prospective  users  with  detailed 
instructions  for  using  LOWESS  as  a  stand-alone  program  or  in 
combination  with  the  experimental  GRAFSTAT  graphics  package. 
In  either  mode,  LOHESS  will  provide  the  user  with  vectors  of 

A 

robust  or  non-robust  smoothed  Yi  values  and  their  associated 
residuals.  Shen  used  in  conjunction  with  GRAFSTAT,  it  will 
also  produce  a  scatter  plot  of  the  original  data  with  the 
LOWESS  smoothed  curve  superimposed.  A  similar  type  presenta¬ 
tion  of  the  absolute  value  of  the  residuals  versus  Xi  is 
also  available  on  reguest  from  the  program.  Figure  4.1 

NON-ROBUST  LOWESS  SMOOTHING;  F  =  .7 
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Figure  4.1  Sample  of  Graphical  Outputs  from  LOWESS; 
Smooths  of  the  Data  (left),  and  Residuals  (right). 


and  Residuals  (right). 


LOWESS  is  a  completely  interactive  program.  All  user 
defined  parameters  and  option  selections  are  entered  in 
response  to  program  gueries.  The  stand-alone  and  combined 
graphics  modes  of  operation  are  differentiated  only  by  their 
initial  set  up  procedures  and  by  the  choice  of  terminals  on 
which  the  program  is  run. 


Although  no  API  programming  skills  are  required  to 
operate  LOWESS,  users  should  become  familiar  with  system 
commands  and  procedures  for  entering  the  APL  environment, 
loading  and  copying  workspaces  and  variables  and  for  saving 
workspaces  by  reading  appropriate  sections  of  [Bef.  14]. 
Operating  instructions  presented  in  the  follow-on  sections 
of  this  chapter  have  been  written  for  users  who  have  had 
little  or  no  experience  with  APL.  Experienced  users  may  find 
it  more  convient  to  refer  to  the  summarized  procedures 
presented  in  the  Tables  at  the  end  of  this  chapter. 

LCWESS  is  not  a  1  Church  computer  center  supported 
program  and  is  not  included  in  any  of  the  APL  libraries 
listed  in  £Bef.  15].  Interested  users  should  contact 
Professor  P.A.W.  Lewis,  Department  of  Operations  Research, 
0. S.  Naval  Postgraduate  School,  for  information  concerning 
access  to  the  APL  wcrkspace  DTNLFNS.  This  workspace,  which 
contains  LOWESS  and  several  other  data  analysis  related 
programs,  should  be  copied  and  stored  on  the  user's  A  disk. 

B.  TEBHI8AL  BEQOIBEBIHTS 

LOWESS,  ifi  the  stand-alone  mode  can  be  run  on  any  APL 
capable  terminal  at  the  0.  S.  Naval  Postgraduate  School.  The 
IBM  GFAESTAT  software,  which  generates  the  graphical 
displays  when  operating  LOWESS  in  the  combined  graphics 
mode,  requires  the  use  of  either  IBM  3277GA  or  3278/79 
graphics  display  terminals.  The  3278  terminals  require 
special  modification  tc  produce  graphical  displays.  None  of 
these  terminals  are  available  for  public  use  at  the  Naval 
Postgraduate  School.  See  Table  17  for  a  summary. 

C.  FEOGBAB  INITIALIZATION:  STAND-ALONE  BODE 

Since  LOWESS  is  written  in  APL,  users  must  enter  the  APL 
sub-environment  after  completing  normal  log  on  procedures. 
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0.  BBOGEAH  INITIALIZATION:  COHBINED  GRAPHICS  NODE 

As  noted  in  Section  B  of  this  chapter,  the  combined 
LORESS-GRAFSTAT  package  can  only  be  run  on  IBM  3  277GA,  3279 
or  specially  conFigcred  3278  graphics  display  terminals. 
Additionally,  efficient  operation  of  GRAFSTAT  requires  a 
minimum  workspace  size  of  2  megabytes.  The  P.  R.  Church 
Computer  Center  has  established  a  limited  number  of  public 
domain  workspaces  with  special  account  numbers  and  passwords 
to  meet  this  need,  £Bef.  5].  Hard  copy  graphics  printers 
are  available  for  use  with  the  3277GA  terminals  located  in 
Ingersall,  Root  and  Spanegall  Halls.  The  remainder  ot  this 
section  focuses  on  the  use  of  the  3277GA  terminals. 

Data  files  stored  on  the  user's  personal  disk  are 
unavailable  for  use  while  operating  in  one  of  the  public 
workspaces.  Users  may: 

1.  send  files  tc  the  public  workspace's  user  number 
prior  to  logging  on  and  commencing  a  work  session; 

2.  "link  to  his/her  own  disk  after  logging  on  to  the 

public  workspace  useing  CP  link  procedures  outlined 
in  [Bef.  17]. 

After  logging  on  to  one  of  the  public  workspaces  and 
completing  the  data  transfer  or  linking  procedures  described 
above,  the  user  must  enter  the  APL  sub-environment  by  typing 
"APLG.S7"!  and  hitting  the  enter  key.  The  response,  "CLEAR 
US"  indicates  that  the  computer  is  ready  to  accept  APL 
commands. 

The  special  APL  characters,  labelled  in  black,  are 
invoked  by  depressing  the  APL  ON/OFF  key.  Since  this  key 
also  turns  the  APL  characters  off,  it  may  be  necessary  to 
check  their  status  by  trial  and  error.  Detailed  instructicns 

1.  The  command,  "AFLGS7",  invokes  special  system  routines 
required  to  support  the  IBM  GRAFSTAT  software  package.  This 

Erocedure  may  change.  Contact  Professor  P.A.W.  Levis, 
epartment  of  Operations  Research,  if  these  procedures  do 
not  work. 


for  using  the  APL  character  set  are  presented  in  Section  C 
of  this  chapter. 

The  initialization  procedure  is  completed  by  loading 
GRAFSTAT  and  LOWESS  into  the  active  APL  workspace.  GRAFSIAT 
should  te  loaded  first,  by  entering  the  system  command 
”) LOAD  GRAFSTAT” .  The  GRAFSTAT  package  is  quite  large  and 
nay  take  several  minutes  to  load.  The  following  set  of  user 
instructions  will  appear  on  the  screen  when  GRAFSTAT  is 
fully  loaded: 

THIS  IS  A  NEW  (5/1/84)  RELEASE  OF  GRAFSTAT.  IT  RUNS  ON  THE 
3277/GA  OR  ON  THE  3276/79.  IT  HAS  A  NUMBER  OF  NEW  FUNCTIONS. 
YOUR  C1D  CONTROL  VECTORS  WILL  WORK  AS  BEFORE.  IF  YCU  )CCPY 
RATHER  THAN  ) LOAD  THIS  WORKSPACE  YOU  MUST  EXECUTE  THE 
FUNCTION  LATENT  BEECRE  STARTING.  THE  NEXT  RELEASE  IS 
SCHEDULED  FOR  7/84. 

TO  BEGIN,  TYPE:  START 

FOR  MCRE  INFORMATION,  TYPE:  DESCRIBE 

It  is  not  necessary  for  tne  user  to  start,  or  even 
interact  with  GRAFSTAT  to  smooth  a  set  of  data:  the  GRAFSTAT 
message  may  be  cleared  by  depressing  the  CLEAR  key. 

Users  who  have  the  APL  workspace  DTNLFSS  stored  on  the 
public  workspace  disk,  or  who  are  linked  to  their  cwn 
personal  disk  where  it  is  stored,  need  only  enter  ") FCOPY 
DTNLFNS  LOWESS  11  to  complete  the  initialization  process.  The 
computer  responds  by  presenting  WS  size  and  date  saved 
information  when  all  programs  have  been  leaded. 
Initialization  is  now  complete  and  the  user  is  ready  to 
execute  LOWESS  by  typing  "LOWESS”  and  hitting  enter.  From 
this  pcint  on  user  enteries  are  made  in  response  to  program 
gueries  cr  instructicns.  See  Table  VI  for  a  summary  of  these 
procedures. 


E.  OPERATION  OF  LOSESS 


This  section  provides  detailed  descriptions  of  the  user 
inputs  required  during  normal  operation  of  LOWESS.  The 
discussion  assumes  that  one  of  the  initialization  procedures 
described  in  Sections  C  and  D  of  this  chapter  has  already 
been  completed. 

Execution  of  the  LOWESS  program  is  initiated  by  typing 
"LOBESS”  and  hitting  the  return  key.  Since  the  program  is 
interactive  it  will  respond  with  a  series  of  queries  or 
instructions  requesting  the  user  to  input  data  or  make  deci¬ 
sions  about  the  operation  of  the  program.  The  exact  sequence 
of  program  initiated  queries  and  instructions  is  formulated 
in  response  to  user  inputs. 

User-computer  interactions  required  during  execution  of 
LON ESS  are  categorized  into  two  types;  data  input  and 
program  operation. 

Since  the  program  cannot  operate  without  data,  the 
initial  concern  of  LCWESS  is  to  locate  and  read  the  data  set 
it  is  about  to  smooth.  Data  can  be  read  from  the  active  APL 
workspace,  a  stored  AEL  workspace  or  from  a  stored  CMS  file. 
Data  that  is  not  located  in  the  active  workspace  must  be 
accessible  from  that  workspace.  This  presents  no  problem 
when  the  user  is  operating  under  his/her  personal  user 
number  and  the  data  is  stored  on  his/her  disk.  This  may 
become  a  problem  when  the  user  is  logged  on  to  one  of  the 
public  workspaces  described  in  Section  D  of  this  cahapter, 
and  has  not: 

1.  sent  the  data  to  the  public  workspace  where  he/she  is 
working  and  stored  it  on  the  assoceated  A  disk; 

2.  linked  to  his/her  own  disk  prior  to  entering  the  APL 
sub-environment,  see  Section  D  of  this  chapter. 

Wherever  the  data  is  stored,  it  MUST  be  formatted  into 
two  separate  lists,  one  containing  the  X  values  and  the 


other  containing  the  corresponding  Y  values  of  the  points 
being  smoothed. 

Data  which  resides  in  the  active  workspace  as  API 
vectors1  is  entered  into  LOWESS  when  the  user  types  the 
variable  name  and  hits  enter  in  response  to  appropriate 
program  reguests. 

Data  which  is  stored  in  another  API  workspace  on  the 
disk  in  use  or  on  a  disk  to  which  the  user  is  linked,  will 
be  transferred  to  the  active  workspace  by  the  sub-program 
DATAINPUT.  The  user  needs  only  to  enter  the  workspace  name 
and  variable  names  when  requested.  DATAINPUT  will  also  read 
and  convert  CMS  files  stored  on  the  disk  in  use  or  on  a  disk 
to  which  the  user  is  linked,  provided  they  are  formatted  as 
described  above  and  contain  only  numerical  data.  A  mixture 
of  alphabetic  and  numeric  characters  in  a  CMS  data  file  will 
create  an  error  and  terminate  execution  of  LOWESS.  These 
data  transfer  features  will  work  equally  well  in  either  mode 
of  operation.  The  1EM  GHAFSTAT  program  contains  functions 
entitled  CMS  HEAD  and  CMS  WRITE  that  will  convert  data  in 
both  directions  when  operating  in  the  combined  graphics 
mode.  Csers  will  generally  not  need  to  use  this  feature  of 
GEAFSTAT,  however. 

Program  operation  inputs  include: 

1.  the  value  of  the  parameter  F  (selection  considera¬ 
tions  are  discussed  in  Chapter  II  Section  C)  ; 

2.  whether  robust  or  non-robust  smoothing  is  desired; 

3.  whether  or  net  a  plot  of  the  original  data  and 
smoothed  curve  is  desired; 


1  In  APL,  a  list  of  data  points  stored  under  a  single  vari¬ 
able  name  is  referred  to  as  a  vector.  See  [Ref.  14],  for 
further  details. 


4.  whether  or  not  a  plot  of  the  absolute  values  of  the 
residuals  and  associated  smoothed  curve  is  desired; 

5.  X  and  Y  axis  labels  for  these  plots. 

Plots  can  only  he  generated  while  operating  IOWESS  in 
the  combined  graphics  mode.  Reguesting  plots  when  GRAFSIAT 
has  not  been  loaded  will  produce  an  error  and  terminate 
execution.  Hard  copies  of  plots  may  be  obtained  by 
depressing  the  HARD  COPY  button  on  the  bottom  of  the 
graphics  screen. 


TABLE  IV 

Summary  of  Terminal  Reguirements  and 
Available  Outputs 


Terminal 

Reguired 

Additional 

Software 

Reguired 

Available 

Output 


Stand-Alone  Mode 
3277GA  3278  3279 

ncne 


Numerical: 

YSMTH  ..  smooth  Y 
XI  ...  original  X 
Y 1  ...  original  Y 
RESY  ..  residuals 


Combined  Graphics 

3277GA,  3279  or  3278 
with  graphics  board 


IBM  GRAFSTAT  pgm. 


Numerical: 

YSMTH  ..  smooth  Y 
XI  ...  original  X 
Y 1  ...  original  Y 
RESY  ..  residuals 

Graphical : 

Smooth  curve 
| Residuals |  vs  Xi 


I. . 


TABLE  f 

Initialization  Procedures,  Stand 

-Alone  Mode 

Objective 

User  Inputs 

Program  Eesponse 

1  ~  , 

.**•0 

(1) 

enter  APL 
environment 

••APL" 

"CLEAE  WS» 

(2) 

invoke  APL 
characters 

APL  ON/OFF  key 

none 

i  ... 

(3) 

load  LOHESS 
and  assoc, 
programs 

) PCOPY  DTNLFNS 
LOHESS 

"saved  (date) 

(time)  " 

TABLE  VI 

i. 

Initialization  Procedures,  Coabined  Graphics 

V.\ 

Objective 

User  Inputs 

Program  Eesponse 

(1) 

enter  APL 
environment 

"APLGS7" 

"CLEAE  WS" 

(2) 

invoke  APL 
characters 

APL  ON/OFF  key 

none 

(3) 

load 

GEAFSTAT 

"J  LOAD  GEAFSTAT" 

initialization 
screen,  see  p  59 

(4) 

load  • 

LOHESS 

")  PCOPY  DTNLFNS 
LOHESS" 

"saved  (time) 

(date)  " 

(5) 

execute 

"LOHESS" 

V.  2fi£  FORTRAN  VERSION  OF  LORESS 


A.  OVERVIEW 

This  chapter  prcvides  prospective  users  with  detailed 
instructions  for  using  a  FORTRAN  program  that  accomplishes 
the  LORESS  curve  smoothing  procedure  described  in  Chapter 
II.  The  program,  entitled  LOWESS,  will  provide  the  user  with 
CMS  files  containing  robust  or  non-robust  Yi  values  and 
their  associated  residuals.  These  data  files  can  be  used  to 
create  plots  of  the  raw  and  smoothed  data  points  using 
DISPLA  [Bef.  7],  EASYPLOT,  or  other  H.R.  Church  computer 
center  supported  IMSI  or  NON-IMSL  plotting  routines. 

LORESS  is  a  completely  interactive  program.  All  user 
defined  parameters  and  option  selections  are  entered  in 
response  to  program  gueries. 

Although  no  FORTRAN  programming  skills  are  required  to 
operate  LORESS,  users  should  become  familiar  with  FORTRAN 
and  RATFIV  operating  system  commands  and  also  with  the  basic 
XEDIT  editor,  by  reading  appropriate  sections  of  [Hef.  18], 
and  [Bef.  19],  A  limited  ability  to  format,  XEDIT  and 
manipulate  data  files  will  be  helpful  when  using  LCRESS  or 
when  interacting  with  any  of  the  plotting  routines  mentioned 
earlier. 

B.  TERMINAL  REQUIREMENTS 

LORESS  can  be  run  on  any  remote  terminal  attached  to  the 
IBM  computer  located  at  the  Naval  Postgraduate  School.  The 
DISPLA  and  EASYPLOT  plotting  routines  require  the  use  of  the 
IBM  3277GA  graphics  display  terminals  located  in  Ingersall, 
Root  and  Spanegall  Halls.  Plotting  routines  that  use  the 
remote  VERSETEC  or  line  printers  can  be  accessed  from  any 
terminal. 


C.  PBOGBAH  INITIALIZATION  (FOBTBAN  VERSION) 

Since  LOWESS  is  not  a  W.R.  Church  computer  center 
supported  prograo,  it  is  not  available  in  any  of  the 
center's  public  access  libraries.  Interested  users  should 
contact  Professor  P.A.W.  Levis,  Department  of  Operations 
Research,  O.S.  Naval  Postgraduate  School,  for  information 
concerning  access  to  LOWESS  and  its  supporting  programs. 
Copies  of  the  programs  listed  in  Table  VII  should  be 
obtained  and  stored  cn  the  user's  A  disk.  Annotated  copies 
of  the  scurce  codes  are  contained  in  Appendix  (B) . 


TABLE  VII 

Programs  and  Subroutines  Beguired  for  the 

Operation  ana  Support  of  the  FOBTHAN  Version  of  LOWESS 

Filename 

Filetype 

Filemode 

LOWESS 

FOBTBAN 

A 1 

LOWS 

EXEC 

A1 

PXSOBT 

FORTBAN 

A 1 

LLBQF 

FROTRAN 

A 1 

PXSOBT  and  LLBQF  are  contained  in  the  INSL  library. 
Users  having  access  to  these  programs  through  the  W.R. 
Church  computer  center  need  not  obtain  personal  copies. 

The  LOWS  EXEC  is  used  to  activate  system  libraries, 
designate  CHS  storage  space  required  for  LOWESS  input  and 
output  files.  It  is  invoked  by  typing  "LOWS  EXEC"  and 
hitting  the  ENTEfi  key.  The  file  definitions  contained  in  the 
LOWS  EXEC  are  listed  in  Table  VIII.  See  [Bef.  17],  for  info- 
mation  on  the  use  of  EXEC  executive  programs. 

This  EXEC  defines  enough  file  space  to  accomodate  five 
data  sets.  The  user  need  only  enter  the  appropriate  file 
number  when  queried  by  LOWESS,  to  smooth  any  of  the  data 


TABLE  VIII 

Input  and  Output 

File  Definitions 

Used  in  LOWS 

File  number 

Filename 

Filetype 

2 

L0W2 

DATA 

3 

L0W3 

DATA 

4 

L0W4 

DATA 

7 

L0W7 

DATA 

8 

L0W8 

DATA 

I 


It  nay  become  necessary  to  change  these  filenames  to 
avoid  losing  data  when  smoothing  a  large  number  of  data  sets 
or  when  smoothing  one  set  a  number  of  times.  This  may  be 
accomplished  in  one  of  the  following  ways: 

1.  by  entering  the  CBS  command  "XEDIT  LOWS  EXEC"  and 
changing  the  appropriate  names; 

2.  by  using  the  CHS  command  "R  (old  filename)  (old  file- 
type)  (old  filemode)  (new  filename)  (new  filetype) 
(new  filemode)"  for  each  file  needing  to  be  changed, 
see  [Bef.  18]. 

File  management  is  important.  It  is  absolutely  impera¬ 
tive  that  data  input  files  have  the  same  filename,  filetype 
and  filemode  listed  in  the  LOWS  EXEC  to  prevent  inadvertant 
smoothing  of  the  wrong  data  or  to  prevent  programming  error. 

D.  D1T1  FILES  (FOBTS11  YEBSIGN) 

LCWESS  requires  that  data  be  input  in  two  columns  of 
floating  point  constants  in  (2F15.5)  format,  X  values  on  the 
left  and  I  values  cn  the  right.  This  is  accomplished  by 
creating  a  new  file  with  the  command  "XEDIT  (filename) 
(filetype)."  The  filename  and  filetype  chosen  should  be  one 
of  these  listed  in  Table  VIII  or  one  that  is  contained  in 
the  user’s  own  LOWS  EXEC.  Refer  to  [Bef.  19],  chapter  2,  for 
more  detailed  instruction  on  creating  files.  The  (2 F  15.5) 
format  requires  that  all  input  variables  contain  a  decimal 
point  followed  by  nc  more  than  five  decimal  places.  The  X 


values  oust  be  entered  in  the  first  fifteen  spaces  and  the  Y 
values  in  the  second  fifteen  spaces  of  each  line  (one  set 
per  line). 

The  output  from  LCWESS  is  placed  in  a  file  designated  by 
the  user.  This  can  be  the  sane  file  used  for  inputting  the 
(X, Y)  values  or  a  different  one.  A  different  file  should  be 
used  if  the  sane  data  set  is  going  to  be  smoothed  with 
several  different  parameters.  This  output  is  printed  in 
(4F15.3)  format.  The  first  column  is  the  original  X  values 
ordered  from  smallest  to  largest.  Column  two  contains  the 
corresponding  Y  values,  while  column  three  contains  the 
smoothed  Yi  values  and  column  four  contains  the  (Yi-Yi) 
residuals. 

E.  OPERATION  OF  LOWESS  (FORTRAN  VERSION) 

This  section  provides  detailed  descriptions  of  the  user 
inputs  required  during  normal  operation  of  LOWESS.  The 
discussion  assumes  that  the  IONS  EXEC  has  been  properly 
prepared  and  executed  and  that  input  files  have  been  built 
according  to  instructions  presented  in  Section  C  of  this 
chapter. 

Execution  of  the  LOWESS  program  is  initiated  by  typing 
"WATFIV  LOWESS  *  (XTIEE”.  Since  the  program  is  interactive, 
it  will  respond  with  a  series  of  gueries  or  instructions 
reguesting  the  user  to  input  data  or  make  decisions  about 
the  operation  of  the  program. 

The  initial  concern  of  LOWESS  is  to  locate  and  read  the 
data  set  it  is  about  to  smooth.  Data  can  only  be  read  from 
one  of  the  files  defined  in  the  LOBS  EXEC  routine.  The  user 
tells  LOWESS  what  file  to  read  by  entering  the  appropriate 
file  number  (2, 3, 4, 7  or  8)  in  response  to  the  instruction 
"ENTER  TEE  FILE  NUMBER  OF  THE  INPUT  DATA  FILE. «  The  program 
will  terminate  with  an  error  if  the  LOWS  EXEC  was  not 


properly  prepared  or  if  the  data  file  was  not  formatted  as 
described  in  the  preceding  section.  Other  program  requested 
inputs  include: 

1.  the  value  of  the  parameter  F  (selection  considera¬ 
tions  are  discussed  in  Chapter  II  Section  C) ; 

2.  whether  or  robust  or  non-robust  smoothing  is  desired; 

3.  the  file  number  of  the  desired  output  file. 


Ag£ENPII  A 
1PL  PROGBAHS 


This  Appendix  contains  annotated  listings  of  the  API 
programs  written  for  this  thesis.  Source  listings  of  the 
system  library  programs  used  to  support  the  CHSREAD  function 
called  in  the  program  OATAINPUT  are  not  included. 

I0WESS  is  an  interactive  program  that  executes  the 
Robust- Locally-Weighted  Regression  Scatter-Plot  Smoothing 
procedure  described  in  the  preceeding  sections  of  this 
paper.  It  calls  the  following  subprograms;  OATAINPUT, 
REPEATCR,  REGRES,  REGRES2  PLOTQDERY  and  LOWS  during  execu¬ 
tion.  Refer  to  Chapter  IV  for  detailed  user  instructions. 


*«LOWEU 

CO] 


Cl] 

C2] 

C3] 

C«] 

C3] 

CO] 

C7] 

CO] 

m 

CIO] 
•»CH] 
■*C12] 
C13] 
CM] 
CIS] 
C 14] 
M7] 
CIO] 
C1?l 
C20] 
C21] 
C22] 
C23] 
C24] 
C23] 
C20] 
C27] 
C28] 

T291 


LOWEST) N| QiWXiJiXiAiPiQi TTRP 1U1 Di TXi WT i ZiPRi DA) DB1R1UI |M|R0| 
AR i RHT i PROCEED i N 1 |PT|IKP| YTi Fi ROB) REG) XAXITi YAXITi 
PHORiQSSiQttiPT 

•  M  DO  NOT  HOVE  OR  ERASE i  GRAFTTAT  FUNCTION  HEADER 

»«*  GRAFTTAT  WILL  NOT  ADD  A  LINE  TO  THU  FUNCTION  WITHOUT 

■  THU  HEADER 

nna 

■  M  LOUEU  CALL;  THE  FOLLOWING  PROGRAHT  AND  VARIABLE! : 

DATAINPUTi  REPEATCKi  PLOTQUERY i  REGRET)  REGRET2)  RPLTi 
•M  NRPLTi  RETPLTi  TRETPLT 
nna 
OPP*-A 
DATAINPUT 
■*L?*)  (PROCEEDIN' ) 

■•0 

LO:  Y1*-Y*-YCAX]  "I  ORDER  DATA 
X1*-X»-XC*X]  J 
'INPUT  F  ...  (CiFil)' 

Q*-l«.5*QHN1«-rX)iFM) 

•DO  YOU  WANT  TO  USE  LINEAR  OR  QUADRATIC  FITTING  DURING  1 
•THII  XHOOTHIHG  ROUTINE?’ 

•(LIN  OR  QUAD)* 

REG*- 1  tO 

•DO  YOU  WANT  TO  UTE  THE  ROBUXT  SMOOTHING  OPTION?* 

' (YES  OR  NO)' 

ROB*- 1  tO 
YX*-N1  f  0 
WX*-N1*1  _ 

COUNTER  FOR  ROBUST  SMOOTHING  LOOP 

It«  “ 

AM  STARTS  FIRST  STRIP  AT  X, _ X« 


£31] 

-♦£32] 

£33] 

■*£34] 

£35] 

■♦£36] 

£37] 

■*£38] 

£39] 

-♦£40] 

£41] 

-*£42] 

£43] 

-*£443 

■*£45] 

£46] 

£47] 

■*£48] 

£49] 

-*£30] 

£31] 

■*£32] 

£53] 

£34] 

■♦£33] 

■*£36] 

£37] 

£38] 

-*£59] 

-*£60] 

£61] 

£62] 

£63] 

£64] 

£65] 

£46] 


L2:I*-I+1  INCREMEMENTS  THROUGH  Xi  ...Xn 

-*L6*\  <  I)N1  )  A 

REPEATCK  PREVENTS  COMPUTATIONS  OFT.  FOR  REPEAT  Xi 
•»L5*l  <SKP*»'Y' )  1 

STRPMA+10,  1  <B-A) )  ) 

-*L3xie*D<-r/|UHX£I]«.-X£rrRP]>  .  COMPUTES  D| 

YS£I]*-<+/(LST/Y)  >+<+/LST*-X*X£I]>  USES  AVG  Y.  IF  D.=  0 

C  t 


L3:UT*-UX£STRP]xTXK<1-<|U»3)X*3>*<<|UHJ+D><1  >  TRICUBE  WT  FCN 
L4  :  -»R2*  i  <REGA 'L '  >  “1 


X£STRP]  REGRET  Y£STRP] 


WEIGHTED  REGRESSIONS 


■*L5 

R2 : X£STRP]  REGRES2  Y£STRP]  J 
L5:-*L2*l(BiN1  )v(IiN1  > 

•*1.2*1  <  <DAMX£I+1  ]-X£A])  >S<DBMX£B+1  ]-X£I+1  ]>  >  > 
A*-A*1 


B*-B+1 


ADVANCE  STRIP 


•♦L5 

L6:R0HR£6<  |R«-RESYM  Y-YS)  >  ] 

•+L1 0*  l  <0*M*-0.5x+/|  <RQ£<rNi  +2) ,  1+LN1  +2])  > 
111  **1 
-*L1 1 

L10:U1*-R+<6*H) 

L1 1  :UX4-(<1-<U1«2))«2)x<<{U1  Xi) 

•*L7*1<R0Bi* 'Y* ) 


BICUBE  WT  FCN 


-*L1  x  t  ( J£2) 

L7:PL0TQUERY  RUN  PLOTS 
YSMTH*YS 


*-*L8*  l  (PTi1 '  Y ' ) 


L8 : *  THE  OUTPUT  FROH  THIS  LOMESS  SMOOTHING  IS  STORED  UNDER  THE* 
' FOLLOWING  VARIABLE  NAMES:' 


YSMTH . SMOOTHED  Y  VALUES' 

XI  .  X  VALUES  ARRANGED  IN  ASCENDING  ORDER' 

Y1  .  ORIGINAL  Y  VALUES' 

RESY  .  RESIDUALS' 


DATAINPOT  controls  the  data  entry  portion  of  the  proce¬ 
dure.  Data  and  progran  operating  parameters  are  entered  in 
response  to  program  gueries.  DATAINPUI  accepts  data  that  is 
stored  in  the  active  APL  workspace,  transfers  data  from 
other  APL  workspaces  and  converts  CMS  data  into  APL. 
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mDATAINPUT 

£63  DATAINPUT,8Sb8S2,8S4 
tU  PROCEED* 'Y ' 

C  2  3  •  • 

[3]  ' IS  YOUR  DATA  SET  LOCATED  IN  THIS  WORKSPACE?' 

[4]  '(YES  OR  NO)1 

[33  QSIMtfl 

•*[63  ^LP1M(8S1-'N'> 

m  'ENTER  THE  NAHE  OF  THE  X  VARIABLE • 

[83  X*Q 

[93  'ENTER  THE  NAHE  OF  THE  Y  VARIABLE' 

[183  Y*a 
l[H3  ■♦END 

[123  LP1 : ' IS  YOUR  DATA  LOCATED:' 

[133  '  (1)  IN  AN  APL  WORKSPACE  LOCATED  ON  THIS  DISK  OR  ON  A  DISK' 

[(43  '  THAT  YOU  ARE  LINKED  TO* • 

[(S3  '  (2)  IN  A  CHS  FILE  ON  THIS  DISK  OR  ON  A  DISK  THAT  YOU  ARE' 

[163  •  LINKED  TO* • 

[173  '  <3)  NEITHER  (1)  OR  (2)  ABOVE.' 

[183  'ENTER  (1,2  OR  3)' 

[193  8S2*Q 

•t[263  -MLP2.LP3,LP4)[8S23 

[213  LP2 : 1  TO  TRANSFER  YOUR  DATA  TO  THIS  WORKSPACE:* 

[223  '  (1)  TYPE  ...IPCQPY  (WS  NAHE)  (X  VARIABLE  NAHE)  (Y 

VARIABLE  NAHE)' 

[233  '  EXAMPLE :  >PCOPY  DATA  X  Y' 

[243  '  IF  YOUR  DATA  IS  STORED  AS  TWO  SEPERATE  VARIABLES' 

[233  '  (2)  TYPE  ...)PCOPY  <WS  NAHE)  (VARIABLE  NAME)' 

[263  '  EXAMPLE:  )PCOPY  DATA  ARRAY' 

[273  '  IF  YOUR  DATA  IS  STORED  UNDER  A  SINGLE  VARIABLE  NAME* 

[283  '  AS  IN  A  TWO  DIMENSIONAL  ARRAY' 

[293  '  ' 

[383  •  DATE  AND  TIHE  SAVED  INFORMATION  IS  DISPLAYED' 

[313  '  WHEN  THE  TRANSFER  IS  COMPLETE.  THEN  ENTER  -»  CO 

i 

[323  '  TO  CONTINUE  THE  LOUESS  SMOOTHING  PROGRAM' 

[333  S*DATAINFUTKO 

[343  GO: 'DO  YOU  NEED  TO  DEFINE  YOUR  X  AND  Y  VARIABLES  ANY  FURTHER?' 
[333  'ANSWER  NO  IF  YOU  ENTERED  SEPARATE  X  AND  Y  VARIABLE  NAMES' 

[363  'IN  THE  PRECEDING  STEP.  OTHERWISE  ANSWER  YES.' 

[373  '(YES  OR  NO)' 

[383  8S3HtQ 

-i[393  1END«1(8S3-'N') 

[483  'DEFINE  THE  X  VARIABLE* 

[413  ’  XfO 

[423  'DEFINE  THE  Y  VARIABLE* 

[433  Y*0 

-*[443  ■♦END 

[433  LP3:'TO  TRANSFER  YOUR  CMS  DATA  FILE  TO  THIS  WORKSPACE:' 

[463  '  (1)  ANSWER  THE  FOLLOWING  DUE ST  IONS  ABOUT  YOUR  X  DATA  FILE* 

[473  XtCHSREAD 

[483  '  (2)  ANSWER  THE  FOLLOWING  8UESTIONS  ABOUT  YOUR  Y  DATA  FILE' 

[493  Y*CNSREAD 

[383  'YOU  ARE  NOW  READY  TO  PROCEED  WITH  LOWESS' 

-*[51 3  ■♦END 

[323  LP4 ; ' YOUR  DATA  MUST  BE  STORED  IN  AN  APL  WORKSPACE  OR  IN  A  CMS 
FILE' 

[333  'LOCATED  ON  THIS  DISK  OR  ON  A  DISK  TO  WHICH  YOU  ARE  LINKED. 
LOWESS* 

[343  'IS  BEING  TERMINATED.  PLEASE  COMPLY  WITH  CONDITION  < 1 >  OR  (2) 

[353  'AND  REINITIATE  LOWESS.' 

[363  PROCEED* 'N* 

[573  END:SaDATAINPMT*8 
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REPEATCK  reduces  the  number  of  computations  required  to 
smooth  a  data  set  by  assigning  the  same  smoothed  Y  value  to 
data  points  that  have  the  same  X  value. 


REPEATCK 
CO]  REPEATCK 
[1]  SKP«- '  N 1 
->C2]  ->£NDx  \  <  I£1  ) 

->C3]  ->ENDx»<XCI]*XCI-1]> 
C4]  YS[I]<-YS[I-1] 

C5]  SKP«- 1 Y 1 
C6]  END: 


EIOTCOERY  controls  the  the  graphical  output  when  oper¬ 
ating  with  the  IBM  GEAFSTSAT  statistical  graphics  package. 
It  calls  the  sub  program  LOUS  to  smooth  the  absolute  value 
of  the  (Yi-Yi)  residuals  obtained  from  smoothing  the  orig¬ 
inal  data. 


••PLOTQUERY 

CO]  PLOTQUERY 

cn  •  • 

C2]  'DO  YOU  WANT  A  PLOT  OF  YOUR  LOUESS  SMOOTHED  CURVE?' 

C3]  ' (YES  OR  HO)  .  ENTER  NO  IF  NOT  USING  GRAFSTAT' 

C4]  PT*t  tO 

-*C31  ■♦ENDxUPTA'Y' ) 

CA]  'INPUT  X  AXIS  LABEL' 

C7]  XAXIS^Q 

C8]  'INPUT  Y  AXIS  LABEL' 

C9]  YAXISH1 

■*C10]  -»PL1  (ROBA' Y '  > 

C 1 1 3  PHDRO' ROBUST  LOWESS  SMOOTHING j  K  -  * , ?F 

Cl 2]  BUN  RPLT 

■*C13]  -*PL2 

Cl  4]  PL1  :PHDR«-’NON-ROBUST  LOUESS  SMOOTHING*  F  •  ' ,  TF 
Cl 3]  BUN  NRPLT 

Cl  A]  PL2 : ' DO  YOU  WANT  A  PLOT  OF  | RESIDUALS!  VS  X?' 

Cl 7]  ' (YES  OR  NO)' 

ci  83  QS3M  TO 

■*C1?]  ■*ENDH(QS3A'Y') 

C20]  'DO  YOU  WANT  THIS  PLOT  SMOOTHED?' 

C21 3  '(YES  OR  NO)' 

C22]  QSAMtO 

*C23]  -*PL3x\  (QSAA'Y' ) 

C24]  X  LOUS ( | RESY ) 

C23]  BUN  SRESPLT 

•*C2A]  -*END 

C27]  PL3 : BUN  RESPLT 
C28]  END: 


LOWS  is  used  to  smooth  the  (Yi-Ii)  residuals  obtained 
from  smoothing  the  original  data  set.  It  operates  exactly 
like  LOWESS  except  for  the  data  input  and  graphical  output 
setctions. 


•♦•LOWS 

C«1  X  LOWS  YjNIjQjWXtJiIiAiBjQj STRPjUjDi TX;  WT;Z;BRiDAiDB|R;li1 ;Mj 

ROiAR;RHSjYZ 

Cl  3  Y*YCW3 

C23  X*-XC4X3 

C3]  QH0.3>Q*<N1*>X)«F 

C43  YS«-N1  pd 

C53  WXH11H 

M3  J«* 

C71  LI  ’•  JKJ+1 
C01  !•* 

C?3  A4-1 

C1«3  BMl 
tin  L2:I«-r+1 

♦Cl 23  ♦LAX  (1>N1 ) 

Cl  33  REPEATCK 
♦CM3  ♦L3x  <SKP«'Y' ) 

Cl  33  STRP*< A+<6, \ (B-A) ) ) 

♦CM3  ♦L3n«FD*-r/IU*-<XCI3*.-XCSTRP3) 

Cl  73  WT*-WXCSTRP]*TX»-Q*1 
Cl 03  YSCI3*-<*/(LSf/Y))  ♦(♦/LST*X«XCI>M3 

♦Cl  73  ♦LS 

C263  L3:  WT*WXCSTRP3*TX*<  ( 1  •<  |(J*3)  )*3)  *  <  ( IUHI+D)  <  1 ) 

♦C21  3  L4:^R2x  <REGj*'L*  ) 

C223  XCSTRP]  RECRES  YCSTRP3 
<4C23]  +L3 

C243  R2: XCSTRP]  RECRES2  YCSTRP3 

♦C233  L3:-H.2«UBiN1)v<IiNt) 

♦C2&3  ♦L2«i  <  (DAMXCIX  3-XCA3)  >S<DBMXCB-M 3-XCX-M  3> ) ) 

C273  A*-A*1 

C2B]  B«-B+1 
♦C273  -*LS 

C303  LA:RO»>1RC*<|R*-<Y-YS))3 
♦C31  3  ♦L10X<0PH«*0.3«»/|  <R0C<rN1+2) ,  1HN1+2])) 

C323  U1M 
♦C333  ♦L1 1 

C343  L10:U1*R+<4*H> 

C333  L1 1 : WX*< <1-<Ut *2) >*2)*<  < |Ut  >< 1 > 

♦C3A3  ♦Ll2n<ROBP*Y') 

♦C373  ♦L1«»<JS2) 

C303  LI  2: 


REGRES  computes  linear  least  squares  regressions  of  Y  on 
X  while  REGRES2  computes  quadratic  least  squares  regressions 
of  Y  on  X. 


xREGRES 


ii 


XR  REGRES  YRjDENiWI i Dl j B2 

DENH  <  +  /U1  )x  (+/W1  xXR*2>>-<  <+/XRxWl<-UT»0.5>»2> 

■♦LI  XI  ((  |  DEN >10.0001  ) 

YS[I]f <+/YR)+pYR 

L?iB2f<<<+/W1  )x(+/<U1xXRXYR)  ))-((+/U1xXR)x(+/U1xYR)  ))-rDEN 
B1M  <*/U1  XYR)-B2X(+/U1  xXR)  )•?■<  I /U1  ) 

YS[I]«-B1+B2xXCn 


•REGRES2 

t!3 


■p 


X2  REGRES2  Y2 
A1*  <+/X2x<UT*0.5>> 

A2M  +/( X2*2)  x  <UT»0 . 5) ) 

A3H+/(X2»3)x(UT*0.5) ) 

AR2f  3  3  p(+/WT*0.5),A1,A2jA!iA2,A3,A2,A3,(+/(X2M)x(WT*0.5)) 
RHS2H+/Y2XUTX0.5) ,  (+/X2xY2xUT*0.5) 

RHS2^  3  1  pRMS2, <+/<X2*2)xY2xWT»0.5> 

BR*  RHS2BAR2 

YSCI^BRQI  j  1  ]♦< BRC2 j  1  ]xXCI]>  +  (BR[3j  f]xXCI]«*2) 


The  following  character  strings  are  the  screen  vectors 
used  by  the  RON  function  of  GRAFSTAT  to  produce  the  plots  of 
the  ICNESS  saoothe  curves  of  the  original  data  and  absolute 
value  of  the  residuals. 

••NRPLT  73  CHARACTER 

M«Xt«YtjYS»e  *9PHDR9XAXIS9YAXIS92I9LIN9LIN9<  |  1*0  I 

•  • 


ppRESPLT  89  CHARACTER 

MI9X9< |RESY)999t9.HtiVAoa4t(i' '9XAXIS9' IRESIDUALSI ’9229LIN9LIN9I  I 
196  I  9  99 


■■RPLT  73  CHARACTER 

M9XI9YI  i  YS98  1919. •  tiVAoaW  '9PHDR9XAXIS9YAXIS92<9LIN9UN9I  I  <90  I 

9  9 


•■SRESPLT  83  CHARACTER 

•»I9X9< IRESY) jYS99 

l9l9.»*«?AO#4t9* *9* ‘9XAXIS9' IRESIDUALSI '9229LIH9LIH9I  I  <90  I  9 
99 
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APPENDIX  B 
FOETH AN  PROGRAMS 


This  appendix  contains  a  listing  of  the  FORTRAN  program 
and  subroutine  written  to  support  this  thesis.  IKS! 
programs,  IIBQF  and  PXSORT,  used  to  support  the  1CWESS 
program  are  not  listed.  Detailed  user  instructions  for  oper¬ 
ating  these  programs  are  contained  in  Chapter  7. 


01/200*1. 0/,A 
00*0. 0/,  HT 


4.  V  V  •  V//  |  HA 

R 1  (200)  /200 


jldof/ifioiolo 

*0.0/,R0,F,C  ( 


vif  CLli 

ipfrp  FP 

AX, BX. A  1.0, II. 12, 13, 14, 15, 16,17,18,  19,110,  N,IWK(2)  ,IER,  ROE 
C,IF1,IF2  C 

DATA  AX/1/ ,ROB/-1/,  N/0/  C 
F=.  33 
IF1=2 
IF2=4 
N30 

1  N=N  +  1 

READ  (I FI, 901,  ENE*2)  X (N)  , T  (N) 

GO  TO  1 

2  N=N-1 
CALL  XYSORT 
Q=IFIX 

4  CCNTINt 
AX=1 

DO  65  11*1, N 
12=0 
D*0. 0 

DO  10  I^=  AX  ,  EX 


IYSORT  (X,  1,1 ,  N) 
C^FLOiT(N)*F)  *.5) 


12= 


SJTOi&iilBl 


5 

10 


(iSOT.'ABS|0jl2J)  .GE.D)  GO  TO  5 


D*ABS  (0(12)) 

CONTINUE 
CCNTINOE 

IF  (.NOT. D.GT.  0.00001)  GO  TO  30 

'do -  -  * 


25  14  =  1,0 
U  1*ABS  (0  (14)  /D) 

IF  ( .  NCT.  ul.LT. 1.0) 


15 


TX 
HT 
GO 

CONTINOE 
TX  (14 


TO  20 


GO  TO  15 
1.0-  (0*1  **3)  )  **3 
X (14) *HX (A  1*14) 


)  =0.  c 

HT  (14)  =0.0 
CONTINOE 


20 

25 


CONTINOE 


GO  TO  40 
CONTINUE 

DO  35  15  =  1, Q 
TX(I5)  =1.0 


NT  il5]=HX (A1  +  I5) 

CONTINUE 
CONTINUE  C 
A  1,1)=0.0 
A  1,2) =0. 0 
A  2,1)  =0. 0 
A  2,2) =0.0 
B  1,1)=0.0 
B  2jl)=0.0 
DO  45  16=1, Q 
I7=A  1+16 
W=SQRT  (NT  (16)  ) 

A  (1 ,  1)  =A  (1,1  +  W 
A  1,  2  =A  (1,2  ♦  (X  (17)  *H) 

A  2,2  =  A  (2,2)  ♦  N*(X  (17)  **2) ) 
B  1,  1  =B  (ijl  +  (I  (I7)*W> 


B{2.  1 
CONTINUE 


:B  (2,1)  ♦  (I 


*X(I7)*N) 


A  (2.1}=A(1,2)  C 

CALL  LLBQF  (1,2.2, 2, B, 2, 1,0. C, BETA, 2, INK, UK, IER) 

YS  <3UU5fes  ^ +BfeT*  (5  J)  (i  i) 

CONTINUE 

IF(BX.GE.N)  GO  TO  60 
IF  (II. GE.  N)  GC  TO  60 


DA=X  (IHI)-X  (AX) 
DB=X  (BX+  1  l-X  (l1  +  1) 
IF  ( .  NOT.  EA.GT.  DB)  G 


AX*AX+1 
BX=BX  +  1 
GO  TO  50 
CONTINUE 
CONTINUE 


DB) GO  TO  55 


A1=(AX-1) 

CONTINUE  C 
DO  70  18=1, N 

R(I8)*Y  (18) -IS  (18) 
R 1  (13)  *ABS  (R  (18)  ) 
70  CONTINUE  C 

CALL  PXSORT  (R1,1,N)  C 


(N+1)/2 

In  ♦2)  n 


BED*  (R1  (L 1)  +R1  (!2))/2.0 
DO  85  19=1. N 

IF  (  (R1  (19)  .  GT.0.  0)  .  A  ND.  (ABS  (NED)  .  GT.  0.0) )  GO  TO  71 
NX  (19)  =  1 .0 


GO  TO  80 
RU=R  (1 9)  /  (6  , 
IF  (.NOT.  ABS 


GO  TO  80 


C*BED) 
BO)  .LT. 


.LT.  1.  0)  GO  TO  75 
(RU*  *2) j **2 


75  CONTINUE 

NX  (19)  *0.0 
80  CONTINUE  C 

85  CONTINUE  C  TEST 

WBITEJ6. 99  if  (N  X  (L)  L=1 ,  N) 

991  FOBBATflX,  10F7.3)  C  END  TEST  C 

FCB=BOB+  1  C  IF  (.NOT.  ROB.  GE.  2)  GO  TO  4 
DO  90  110=1. N 

WRITE  (IF2,900)X(I10),Y(I10)  ,YS(I10) 

90  CONTINUE 
STCP 

900  FOBBATMX.3F15.3) 

901  FCFBAT (2F 15. 3) 

INC  C 

SUBROUTINE  XISCFT (A,B, II, JJ)  C 


DIMENSION  A(JJ)  ,B  (J J)  ,  10  (16)  ,IL(16) 

M=  1 

I=II 

J=JJ 

5  IF  (I  .GE.  J)  GO  10  70 
0  K=I 


10  K=I 

IJ*  (I+J)  /2 
T=A JIJ) 

T1»B  (IJ) 

IF  (A  (II  .IE 
A  (IJ)  =A(I) 


T)  GO  TO  20 


T=A  (I  J) 

T1 -B (IJ) 
20  L= J 

IF  (A  { J)  . 
A  (IJ)  =A  (J 
B  IJj  =B { J 
A  i  J)  =T 
B  J)  =T1 
T=A  (I  J) 
T1=B  (IJ) 


.GE.  T)  GO  TO  40 
J) 


T=A(IJ) 

T1=B  (IJ) 

IF  (A  (I)  .IE.  T)  GO  TO  40 
A  (IJ)=A(I) 


I  w11 

B  IJ  *T1 
T=A  (I J) 


T1*E  (I 
GO  TO 
30  TT*A  (I 
TT1*B ( 


40  1*1-1 

IF  (A  (I)  .GT.  T)  GO  TO  40 
50  K=K+ 1 

II  (A  (K)  .IT.  T)  GO  TO  50 
IF  (K  .IE.  I)  GO  TO  30 


GO  TO  50 


TO  30 


IF(1-I  .IE.  J-K)  GO  TO  60 
II  (N)  *1 

S-T'*1 

M*a+i 
GO  TO  80 
60  II  (M)  =K 

M=M+ 1 
GO  TO  80 
70  M=M-1 

IF  (M  .EQ.  0)  REIORN 
I=II(M) 

J=IU  (M) 


80  I F  (J-I  .GE. 
IF Jl  .EQ.  I 
1=1-1 


11)  GO  TO  10 
:>  GC  TO  5 


90  I*I*1 
IF  (I 
IF  A( 


IF  (I  .EQ.  J)  GC  TO  70 
IF  (A  (I)  .IE.  A  (1+1) )  GO  TO  90 
T  *  A(I*1) 

!♦!) 


T1*B  ( 
K=I 


R=  K- 1 

IF  (T  .LT.  A  (K)  )  GO  TO  100 
A  (K+1 )  =T 
B  (K+  1)  =T  1 
GO  TO  90 
END  $ENTRI 


The  following  LCfiS  EXEC  routine  sets  the  file  defini¬ 
tions  and  invokes  the  appropriate  systems  libraries  required 
to  execute  LOWESS.  This  routine  is  executed  by  typing  "LOWS 
EXEC. " 


GICBAI  MACLIB  IESLSP  NONIMSL 
FILEDEF  02  DISK  LOW2  DATA  A  (PERM 

FILED EF  03  DISK  LOW3  DATA  A  (PERM 

FILEDEF  04  DISK  LOW4  DATA  A  PERM 

FILEDEF  07  DISK  LOW7  DATA  A  PERM 

FILEDEF  08  DISK  LOW8  DATA  A  PERM 


APPENDIX  C 
DATA  SETS 

This  appendix  contains  four  data  sets  that  were  used  to 
compare  LOWESS  with  MOVING  AVERAGE,  COSINE  ARCH  and  LEAST 
SQUARES  REGRESSION  rooutines  in  Chapter  III.  They  include: 

1.  TEST  SET  ONE  ...  used  to  test  LOWESS'  ability  to 
detect  and  follow  linear  trends. 

2.  TEST  SET  TWO  ...  used  to  check  LOWESS'  performance  on 
data  sets  that  contain  abrupt  changes  in  curvature. 

3.  TEST  SET  THREE  ...  used  to  test  LOWESS'  ability  to 
fellow  smooth  changes  in  curvature. 

4.  Lag-1  points  from  NEAR  (1)  data  ...  used  to  check 
LOWESS'  performance  on  unegually  spaced  data. 


TABLE 

IX 

Data  Set  One 

X 

Y 

X 

Y 

X 

Y 

.200 

‘.398 

10.200 

8.696 

20.200 

21 .320 

.400 

‘.811 

10.400 

10.303 

20.400 

19.996 

.600 

'.103 

10.600 

10.997 

20.600 

21 .018 

.800 

1.136 

10.800 

10.273 

20.800 

21 .047 

1  .000 

1 .633 

11.000 

11.345 

21 .000 

21 .704 

1  .200 

1 .416 

11.200 

10.477 

21 .200 

21 .832 

1  400 

1.136 

11.400 

12.668 

21.400 

20.408 

1  .600 

3.402 

1 1 .600 

11.369 

21 .600 

23.367 

1 .800 

1.137 

11.800 

12.378 

21.800 

21 .4(8 

2.000 

2.110 

12.000 

14.180 

22.000 

21  .089 

2.200 

1.481 

12.200 

12.638 

22.200 

21 .204 

2.400 

2.821 

12.400 

13.733 

22.400 

23.393 

2.600 

.669 

12.600 

12.851 

22.600 

22.441 

2.800 

'  3.460 

12.800 

12.490 

22.800 

23.304 

3.000 

1  .897 

13.000 

12.077 

23.000 

22.802 

3.200 

3.097 

13.200 

12.815 

23.200 

23.039 

3.400 

2.340 

13.400 

14.338 

23.400 

23.811 

3.600 

2.361 

13.600 

(4.463 

23.600 

22.421 

3.800 

1  .911 

13.800 

12.763 

23.800 

23.322 

4.00C 

3.026 

14.000 

13.807 

24.000 

22.419 

4.200 

4.412 

14.200 

12.900 

24.200 

23.249 

4.400 

4.893 

14.400 

14.707 

24.400 

24.703 

4.600 

6.147 

14.600 

13.369 

24.600 

23.373 

4.800 

3.443 

14.800 

(4.053 

24.800 

24.870 

3.000 

2.832 

13.000 

12.204 

25.000 

24.603 

3.200 

4.171 

15.^00 

(3.897 

25.200 

26.389 

3.400 

3.238 

15.400 

18.607 

23.400 

26.764 

3.600 

3.  ©’’3 

13.600 

(6.136 

23.600 

26.238 

3.800 

5.487 

13.800 

16.098 

23.800 

26.291 

6.000 

3.406 

16.000 

16.284 

26.000 

26.801 

6.200 

6.332 

16.200 

17.160 

26.200 

23.433 

6.400 

6.939 

16.400 

18.488 

26.400 

26.764 

6.600 

7.300 

16.600 

18.125 

26.600 

26.202 

6.800 

6.399 

16.800 

16.603 

26.800 

27.664 

7.000 

6.766 

17.000 

17.017 

27.000 

26.822 

7.200 

8.650 

17.200 

17.446 

27.200 

29.074 

7.400 

9.236 

17.400 

(6.546 

27.400 

27.372 

7.60© 

7.217 

17.600 

18.738 

27.600 

28.872 

7.800 

7.935 

17.800 

17.962 

27.800 

27.763 

8.000 

7.033 

18.000 

19.337 

28.000 

26.499 

8.200 

8.239 

18.200 

18.006 

28.200 

28.565 

8.400 

9.163 

18.400 

20.031 

28.400 

28.201 

8.600 

8.003 

18.600 

16.70! 

28.600 

27.210 

8.800 

8.930 

18.800 

20.623 

28.800 

29.029 

9.000 

9.033 

19.000 

17.482 

29.000 

29.271 

9.200 

8.373 

19.200 

18.149 

29.200 

28.834 

9.400 

8 . 860 

19.400 

19.430 

29.400 

30.777 

9.600 

11.480 

19.600 

18.143 

29.600 

28.802 

9.800 

8.796 

19.800 

20.267 

29.800 

28.863 

10.000 

9.303 

20.000 

20.343 

30.000 

29.998 

79 


TIBLE  Z 
Data  Set  Two 

X  Y  x  y  x  y 


.200  ".46 2  11.200  3.849 

.400  “2.191  11.400  4.354 

.600  1.403  11.600  3.182 

.800  .947  11.800  3.139 

1.000  .473  12.000  4.318 

1.200  .832  12.200  3.736 

1.400  ".137  12.400  4.989 

1.600  2.336  12.600  3.732 

1.800  .779  12.800  3.163 

2.000  2.397  13.000  4.032 

2.200  1.144  13.200  3.394 

2.400  1.832  13.400  3.893 

2.600  “.406  13.600  3.747 

2.800  .419  13.800  4.171 

3.000  2.446  14.000  4.962 

3.200  .641  14.200  3.336 

3.400  1.937  14.400  4.792 

3.600  1.080  14.600  3.393 

3.800  1.384  14.800  4.630 

4.000  .231  13.000  3.203 

4.200  .410  15.200  4.463 

4.400  2.743  13.400  6.338 

4.600  1.793  13.600  3.484 

4.800  1.121  13.800  2.766 

3.000  1.235  16.000  4.633 

3.200  2.942  16.200  2.812 

3.400  2.104  16.400  3.668 

3.600  2.733  16.600  3.053 

3.300  2.717  16.800  3.319 

6.000  3.136  17.000  3.374 

6.200  2.880  17.200  6.472 

6.400  1.219  17.400  4.420 

6.600  3.013  17.600  4.623 

6.800  3.843  17.800  3.396 

7.000  3.329  18.000  3.778 

7.200  .303  18.200  3.763 

7.400  2.686  18.400  4.290 

7.600  2.717  18.600  4.900 

7.80©  3.438  18.800  2.397 

8.000  2.689  19.00©  6.039 

8.200  3.278  19.200  3.894 

8.400  4.967  19.40©  6.093 

8.600  4.288  19.600  4.174 

8.800  3.788  19.800  3.613 

9.000  2.677  20.000  3.820 

9.200  3.610  20.200  4.844 

9.400  3.908  20.400  5.602 

9.600  3.283  20.600  4.933 

9.800  3.383  20.800  3.634 

lO.OOe  4.413  21.000  4.003 

10.200  3.378  21.200  4.389 

10.400  1.396  21.400  6.543 

10.600  2.962  21.600  4.340 

10.800  3.203  21.300  3.417 

11.000  4.682  22.000  3.613 


22.200  4.819  33.200  1.637 

22.400  4.469  33.400  2.243 

22.600  4.997  33.600  .862 

22.800  6.236  33.800  3.226 

23.000  6.278  34.000  1.362 

23.200  6.490  34.200  2.923 

23.400  3.499  34.400  2.736 

23.600  .5.860  34.600  1.736 

23.800  4.323  34.800  2.129 

24.000  4.949  33.000  1.433 

24.200  6.690  33.200  1.313 

24.40©  6.339  33.400  2.736 

24.600  3.899  33.600  1.376 

24.800  4.233  33.800  .363 

23.000  3.823  36.000  2.933 

25.200  3.742  36.200  .266 

23.400  4.873  36.400  1.664 

23.600  3.497  36.600  .323 

23.800  7.697  36.800  .783 

26.000  4.600  37.000  1.419 

26.200  3.374  37.200  1.997 

26.400  2.242  37.400  .333 

26.600  4.078  37.600  1.137 

26.800  4.090  37.800  .306 

27.000  3.319  38.000  .671 

27.200  6.631  38.200  ".612 

27.400  3.513  38.400  .376 

27.600  3.141  38.600  1.921 

27.800  4.818  38.800  ".476 

28.000  1.431  39.000  "1.014 

28.200  3.936  39.200  1.788 

28.400  4.203  39.400  1.306 

28.600  3.202  39.600  .833 

28.800  1.977  39.800  "1.468 

29.000  4.046  40.000  1.334 

29.200  3.971  40.200  ".342 

29.40e  4.173  40.400  “2.331 

29.600  4.383  40.600  1.163 

29.800  3.479  40.800  .627 

30.000  4.621  41.000  .073 

30.200  1.989  41.200  .332 

30.400  4.408  41.400  ".697 

30.600  3.896  41.600  1.696 

30.800  3.112  41.800  .039 

31.000  3.422  42.00©  1.797 

31.200  4.740  42.200  .264 

31.400  3.108  42.400  .872 

31.600  3.892  42.600  "1.446 

31.800  1.630  42.800  ".701 

32.000  4.039  43.000  1.246 

32.200  4.600  43.200  ".639 

32.400  2.123  43.400  .577 

32.600  1.623  43.600  ".360 

32.800  1.602  43.800  ".136 

33.000  3.180  44.000  "1.349 


TABLE  XI 
Data  Set  Three 


X 

Y 

X 

Y 

X 

Y 

.063 

.261 

2.135 

.360 

4.208 

“1.733 

.126 

".129 

2.198 

.716 

4.270 

“.860 

.188 

.053 

2.261 

1 .376 

4.333 

.049 

.231 

“.293 

2.324 

.410 

4.396 

“.870 

.314 

1.316 

2.386 

.988 

4.439 

“1  .282 

.377 

1 .340 

2.449 

.326 

4.322 

“1.701 

.440 

*.333 

2.312  * 

.873 

4.584 

“1 .025 

.302 

1  .431 

2.373 

.173 

4.647 

“.811 

.363 

.088 

2.638 

1 .079 

4.710 

“.891 

.628 

.433 

2.700 

.520 

4.773 

“1 .088 

.691 

.913 

2.763 

1.167 

4.836 

“.980 

.734 

.322 

2.826 

.471 

4.898 

“.662 

.816 

1,398 

2.889 

.684 

4.961 

“.308 

.879 

1  .381 

2.952 

.833 

5.024 

“1.729 

.942 

.01 1 

3.014 

.344 

3.087 

“.399 

1  .003 

.310 

3.077 

“.129 

3.130 

“1 .21 1 

1 .068 

.496 

3.140 

“.053 

3.212 

“.393 

1.130 

1.113 

3.203 

“.343 

3.273 

“1.131 

1.193 

.713 

3.266 

“1.152 

5.338 

“.195 

1  .236 

1  .304 

3.328 

“.111 

3.401 

“.273 

1.319 

1  .082 

3.391 

.024 

3.464 

“1.133 

1  .382 

.474 

3.434 

“.180 

3.326 

“.982 

1.444 

1  .062 

3.317 

“.320 

3.589 

.206 

1.507 

.624 

3.380 

“.633 

3.632 

“.113 

1.370 

.686 

3.642 

.088 

3.715 

“1 .503 

1  .633 

1  .693 

3.705 

“.339 

3.778 

“.228 

1.696 

.168 

3.768 

.216 

3.840 

“.232 

1  .738 

'.023 

3.831 

“.223 

3.903 

“.824 

1.821 

1  .213 

3.894 

.052 

3.966 

“.949 

1  .864 

.174 

3.956 

“1 .417 

6.029 

“.078 

1 .947 

.860 

4.019 

“.899 

6.092 

“.788 

2.010 

1  .028 

4.082 

“.310 

6.154 

.203 

2.072 

.743 

4.145 

.074 

6.217 

“.100 

TABLE  III 

Lag-1  Data  derived  froa  NEAR{1)  Process 


X 

Y 

1  .020 

.466 

.035 

1  .020 

.129 

.035 

.125 

.129 

.153 

.125 

.233 

.153 

2.077 

.233 

2.155 

2.077 

1.821 

2.155 

.042 

1.821 

.036 

.042 

.061 

.036 

.149 

.061 

4.260 

.149 

4.095 

4.260 

3.422 

4.095 

2.854 

3.422 

2.609 

2.854 

2.176 

2.609 

1 .823 

2.176 

1.617 

1  .823 

2.439 

1  .617 

2.047 

2.439 

1.840 

2.047 

3.049 

1  .840 

2.682 

3.049 

2.239 

2.682 

1.889 

2.239 

1.577 

1  .889 

1 .664 

1.577 

.103 

1.664 

.133 

.  1  €3 

.145 

.133 

.207 

.145 

.221 

.207 

.196 

.221 

.170 

.196 

.185 

.170 

.087 

.185 

2.258 

.087 

1.938 

2.258 

1.617 

1  .938 

1 .346 

1  .617 

1.184 

1.346 

1.007 

1.134 

.853 

1.007 

.779 

.853 

.727 

.779 

.822 

.727 

X 

Y 

.871 

.822 

.747 

.871 

1  .385 

.747 

1.189 

1  .385 

.017 

1.189 

.261 

.017 

.366 

.261 

.349 

.366 

.364 

.349 

1.140 

.364 

1.020 

1.140 

3.508 

1  .020 

3.122 

3.508 

2.623 

3.122 

2.654 

2.623 

.209 

2.654 

.255 

.209 

.271 

.255 

1.185 

.271 

.989 

1.(85 

2.867 

.989 

2.488 

2.867 

2.086 

2.488 

1.756 

2.086 

1.530 

1.756 

1.456 

1.530 

.180 

1.456 

.429 

.180 

.031 

.429 

2.951 

.031 
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